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This paper describes the implementation of a compiler for the programming language C. The compiler hes . 
been designed to be capable of producing assembly-language code for most register-oriented. machines 
with only minor recoding. Most of the machine-dependent information used in code generation ts - 
contained in a set of tables whith are constructed. witomebicélly from a machine description provided by 
the implementer. In the machine description, the. ee eee a 
machine-dependent abstract machine for whith the code generator produces intermediate code 

abstract machine is abstract in that it is a C machine: its registers and memory sre defined in terms Mer 
primitive C data types and its instructions perform basic C operations. The sbstract machine is machine- 
dependent in that there is a‘ close correspondence between the registers of the sbstract machine and 
those of the target machine, and betweer the behavior of the abstract machine instructions and the 
corresponding target machine instructions or instruction sequences. The implementer defines the 
translation from an abstract ‘machine program to a target machine program by providing in the machine 
description a set of simple ‘macro definitions fer the abstract machine instructions. In addition, macro 
definitions may be Eisen ee en ares eee memes Sreeeres Saeeney ‘We seoded. 
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1. Introduction | 


This paper describes the implementation of a compiler for the programming language C [1,2], an 
implementation language developed at Bell Laboratories and a descendent of the language BCPL [3] The 
compiler has been designed to be capable of producing assembly-language code for most register- 
oriented machines with only minor recoding. Versions of the compiler exist for the Honeywell HIS-6000 
and Digital Equipment Corporation PDP-10 computers. 


C is a procedure-oriented language. It has four primitive data types (integers, characters, and single- 
and double-precision floating-point), four data type constructors (pointers, arrays, functions, and records), 
and a small but convenient set. of control structures which encourage goto-less programming. An 
important characteristic of C is the minimal run-time support needed. Aithough C supports recursive 
procedures, C does not have built-in functions, 1/0 statements, block structure, string operations, dynamic 
arrays, dynamic storage allocation, or run-time type checking. The only run-time data structure: is the 
stack of procedure activation records. Of course, to run any useful programs, an interface to the 
Operating system is required, and a standard set of I/O routines has been defined in order to encourage 
portability. But the implementation of these routines is optional and separate from the task of 
implementing a C compiler which produces code for a given machine. 


The compiler described in this paper was designed to be portable, that is, to be capable of generating 
code for many target machines with a minimum of recoding. ise considering portability, three classes of 
machines can be defined: 


1. Machines which. can support C programs reasonably efficiently: This class of machines depends only 
_ upon one’s interpretation of the term “reasonably efficiently.” Clearly, all real machines can run C 
programs, limited only by some size constraint related to the availability of memory. However, the 
following capabilities are desirable: (1) the ability to access the current procedure activation record 
and the current argument list in a reentrant manner: - this will require one or two base/index 
registers depending upon the calling sequence, (2) the ability to reference via a pointer variable - 
this will require another base/index register or an indirection facility, (3) character addressing, (4) 
integer arithmetic, and (5) floating-point arithmetic. Not all of the above capabilities need be present 
in the target machine; however, the more that are missing, the more interpretive becomes the 
_ execution of a C program. For example, the HIS-6000 is word-addressed; thus references to 
character variables are interpreted by a small run-time subroutine. 


2. Machines for which the compiler can produce reasonably efficient code: This class of machines is 
Clearly a subset of the first class; the size of the subset is again determined by one’s definition of 
reasonable. The better the correspondence between the target machine and the machine model 
implicit in the compiler, the better will be the object code produced. On the other hand, if the 
correspondence is poor, the compiler may be able to produce only threaded code or instructions to 
be interpreted by software. 


3. Machines which can support the compiler itself: Because the compiler is written in C, one may think 
that this class of machines is identical to the second class of machines; however, there are added 
restrictions which must be made in order to run the compiler on a given machine: the word size of 
the machine must be sufficient to hold all values used by the compiler; any implementation restriction 
on the size of procedures or data areas (as would be likely on the IBM S/360 because of addressing 
deficiencies) must not be such as to prohibit the proper execution of the compiler (this includes the. 
ability of the compiler to compile itself). In addition, there are operating system and configuration 
restrictions: the memory size available to a program must be sufficient to hold the phases of the 
compiler; file space for the source of the compiler must be available and affordable; the I/O routines 
used by the compiler must be implemented. This class of machines is not a subset of the second class 
of machines since the compiler does not use all of the features of the language, notably floating-point. 


This paper concentrates on the second class of machines, those for which the compiler can produce 


See 


reasonably efficient code, given the restrictions of the first class of incieieas those which can support C 
Programs reasonably efficiently. Thus, throughout tiie paper, theterm-"mechine independence” will 
generally refer Te oe ae errr” sey 7 


1.1 Motivation 


One of the serious probleme in the field of software engineering is the difficulty of transferring programs 
to new machines, This is caused in large part by the proliferation of different programming tenguages 
and machines and the significant effort requiced to. implement: a compiler tor: any particular programming - 
language and target machine, One appeosch to solving this problem te te restrict: lengueges 
' to a few standardized languages which. are then. implemented on all target machines of interest: A 
- disadvantage of this approsch is thet it contlicts with the desirability of having. many specielized — 
languages for specialized preblems. Another disedvantage: is the fact thet continual progress sdibvewal 
made in the. development of pragramming langueges so thet by the time: a-lenguage is standardized end 
widely available, it is already “absolete.”. It is. slea difficult-to:echieve compatibility among the various 
implementations of a standardized language. Even it the stenderd language i -well. detined, it is difficult 
for compiler writers to restrain themosives from extending it.and for users te restrain themeelyes from | 
using the language extensions. A similar approach to the problem of program transferability is to restrict 
the number of target.machines. for which compilers must be written by requiring thet each:new machine 
be compatible with a widely-used existing machine. The: stifling of progress in:compuler. architecture 
which would result from this requirement is as undesirable as the stifling of progress progress in pengrenming 
languages which would result from edoption of the previous approach. In addition, if the new mechines 
are only eee ee ee ee en eee ey ee ee ee 
transterring programs from new machines to oki ones. 


An alternative approach to those of language restriction and machine compatibility is to develop 
techniques that reduce the effort required ta write. compilers. for various combinations of lenguages and 
machines. These techniques may be directed at two subprobleme, that of. reducing: the effert.invelved in 
writing one particuter compiler end thet of reducing the. effort. avowed in: writinge family ef relisted 
compilers. The development oc! such. techniques could. heve--benelits-in addition te-improwing prageam 
transferability, ee ee ee & new. language or making Janguages more widely 


available. 


An early effort in this direction was an attempt to devise a universal computer-oriented language UNCOL 
(4], which is both language-independent and machine-independent, to which ail programming languages 

could be translated snd which itesif could be translated with. eccepteble efficiency inte any machine 
language. The idea was that. one need write only..one UNCOL+o-mechine language: transiator for each 
target machine and one source lenguage-to-UNCOlL transistor for eech source janguage, rather then 
having to write one compiler for each source lenguage-mechine language combination. In addition, if 
UNCOL were well defined, then the various implementations of UNCOL could be made:compatible, thereby 
insuring the compatibility of the source language implementations. Unfortunately, the concept of a 
universal language has not led to « practical solution of the problem, the: charecteristics of source and 
machine language independence are incompatible with: ‘the need for: econptebly: efficient trandation from 
UNCOL to machine lenguage. | 


More practical techniques for reducing the effort involved in writing compilers. result if one Sahaldiots 
techniques with more limited goals then those of the UNCOL project. One approach is to develop 
techniques which reduce the effort invoived in writing one particular compiler for some tenguage~machine 
combination. Examples of such techniques ere parser: generators end syntex-directed symbol processors 
(5). Another approach is to develop techniques for writing families of compilers for. meny source 
languages and one target machine. An example of such a technique is a compiler writing system with 
code generation primitives, suchas FSL [6] The third approach, and the one which is taken in this work, 
is that of the portable compiler, a compiler fer a particular source language which can produce code for 
many target machines. It should be noted that techniques such. as parser generators, which can-eid in the. 

_implementation of a single compiler, can be equally useful in the implementation of more general systems _ 
such as compiler writing systems and portable compilers. ~~ 


1.2 Background | 


A compiler can be considered to consist of two logical phases, analysis and generation. The analysis 
phase performs lexical and syntactic analysis of the source program, producing as output some convenient 
internal representation of the program, along with a set of tables containing lexical information and other 
information derived from the declarative statements of the program. The generation phase then 
transforms the internal representation into an object language program, using the information contained in 
the tables produced by the analysis phase. One can confine the machine (object language) dependencies 
of a compiler to the generation phase by a suitable choice of internal representation, i.e. one which is 
_ machine-independent. On the other hand, it is not practical to also confine the source language 
dependencies of a compiler to the analysis phase since this would make the internal representation a 
universal language. Thus the generation phase of a compiler is both source-language-dependent and 
machine-dependent. é 


Most portable compilers require that the generation phase be completely rewritten for each target 
machine [7,8] This effort may represent only about one-fifth of the effort needed to rewrite the entire 
compiler [8]. In the case of the BCPL compiler [9], for example, moving the compiler may require only 
three to four weeks under ideal conditions (but otherwise may require up to five months). However, it 
would be desirable if the amount of recoding necessary to generate code for a new machine could be 
reduced. 


One approach is that advocated by Poole and Waite for writing portable programs [10,11] They 
advocate that before writing a program to solve a particular problem, one define an abstract machine for 
which the program is then written. With this approach, in order to move the program to a new machine, 
one need only implement the abstract machine on the target machine, typically via a macro processor. 
The desired qualities of the abstract machine are that it contain operations and data objects convenient 
for expressing the problem solution, that it be sufficiently close to the target machines of interest so that 
acceptable code can easily be generated, and that the tools for implementing the abstract machine be 
easily obtainable on the target machines. 


This technique can be applied to portable compilers by considering the problem to be the implementation 
of an arbitrary source language. program. The operations and data objects convenient for expressing the 
problem solution are then those which are basic. to the source language. With this technique, a compiler 
would be broken into two parts: a machine-independent translator from the source language to the 
abstract machine language and a machine-dependent translator from the abstract machine language to the 
target machine language. The translator from the abstract machine language to the target machine 
language should be smaller and simpler than the conventional generation phase would be; typically, it 
consists of a set of macro definitions which map each abstract machine instruction into the corresponding 
target machine instruction or instruction sequence. Moving the compiler to a new machine simply requires 
rewriting the macro definitions. 


The major difficulty with the abstract machine approach to portable software is in determining the 
appropriate abstract machine. If the abstract machine is of a high level (i.e., very problem-oriented), then 
the program will be very portable but the implementation of the abstract machine will be difficult. On the 
other hand, if the abstract machine is of a low level (i.e, more machine-oriented), then, unless it 
corresponds closely to the target machine, either the code produced will be inefficient or the 
implementation will be complicated by optimization code. 


The solution to this difficulty proposed by Poole and Waite is to define a hierarchy of abstract machines, 
ranging from a high-level problem-oriented abstract machine to a low-level, machine-oriented, and easy~ 
to-implement abstract machine. In this solution, the higher-level abstract machines are implemented in 
terms of the lower-level abstract machines, and only the lowest-level abstract machine need be 
implemented on a target machine in order to transfer the program; once it is transferred, higher-level 
abstract machines may be implemented directly in terms of the target machine in order to improve 
efficiency. While this technique may be useful for transferring particular programs, it is unlikely that it 


will be acceptable in practical terms as a compilation technique because of the need for additional 
translation steps. An experiment by Srown [12] indicates that one may implement. and then optimize @ 
low-level abstract machine in. shout the. same. time as .it tekes. to..implement. «bigher-level. ebstract 
machine and that the. resulting. implementations. are similarly efficient. Thusen. alternative. eolution isto 
use a low-level sbstract machine, but. silew the implementar to optimize ae. desireds:this seiution is more 
likely to be acceptable as 2 compilation technique. A third solution will be:edveceted in: this paper. 


The technique of rewriting the generatian phase requires thet a non-triviel tranelator from the. internal 
representation to the target. machine. language be. written for each new. target machine. Similarly, the: 
abstract machine apprcech recuirgs: thet.» jranstetor. from the sbetcest sechine.dapguege to the: target 
machine language be written for eech: new. target. machine; if ressonably efficient cade is desired. end. the 
abstract machine does not correspond very closely to the target machine, then this transtater will.eise be 
non-trivial. 


A more desirable goal for a portable compiler is that it have a. generation phase which can be modified to 
produce code for a new target machine. by a process: which is lergely automatic. implicit: in this goal is 
the requirement that the modification process obtain its- “knowledge: about: terget machine tom a (rion- 
procedural) description of the machine, An early effort in this direction. wee.the SLANG system: {13],. 
which attacked the problem of describing s machine-dependent process (code generation) in  machine- 
independent way. In the SLANG system, source language pail hers are transiated into a set of basic 
Operations called EMILs; the EMiLs are transisted into absaluie machine cose using:meero definitions end 
instruction format definitions... The. approach is similar: to. the abatract woshine epproech | im: thet ‘the EMILs 


‘one will not be able to achieve the desired close correspondence between the-ebetract machine ‘and: most | 
i dettialnabehag machines. Nevertheless, the method of priaileetty the instructions of a machine by 


More recently, Miller [14] has explored the problem of constructing. a code generator from a machine 
description. Miller proposes that a generation phase be constructed in two steps. In. the first step, the 
language designer specifies the language-dependent part. of the genecation.phase by writing. a set. of 
procedural machine-independartt . mecro atinitions . for, the operations. ue the. internal: representation 
produced by the analysis phase. These macro definitions define the .operations of the internal 
representation, such as addition, in terms of machine-independent (i.e., language: ). primitives, such 
as integer addition, which are created by the language designer. In the second. step, the implementer 
provides a description of the terget. machine which is used by en sutomatic..code generation | 

named DMACS (Descriptive Macro System) in order to fill out the mecro. definitions of the first step end. 
thereby produce a code generator for the target machine... As. was the. case with.the SLANG system, the 

DMACS machine description defines. the primitive operations: by giving, terget machine code sequences. 
which interpret them. In addition, however, the permitted. locations.of the operands (in. terms of. their 

being in memory or in particular registers) are specified as. ere. the corresponding result tocations, Thus. 
the primitives cen be made to correspond very closely to the instructions of the target machine so that 

the code sequences in the machine description are. elmpler and- the. reeuiling. object. cove. le sore efficient. 


Both the SLANG system ‘and DMACS are intended to be general. in ‘that they ere: nat designed for a 
specific source language. However, true y..is difficult to obtein and the. systems do reflect 
preconceived notions about source langus believed 4 tet since there ere much more significent 
items aa languages . y among. actin # practical implementation of ¢. compiler..for any 
nter language requires, i be. designedl, apacifically for..that Jonguege.. Thin. ides. wes 
recghined to come extent’ in DMACS ‘whore te f are created by the language devigner as 
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convenient for expressing the operations of the source language. On the other hand, DMACS contains no 
notion of storage classes (different mechanisms for accessing variables of the same data type) which are 
needed for C; the implementation of storage classes is machine-dependent and thus must be defined in 
the machine description. In this paper, techniques similar to those used in the SLANG system and in 
DMACS are used in the implementation of a portable C compiler. 


1.8 Method 


The goal of this research is to design a generation phase for a C compiler which can be modified to 
produce code for many machines by a process which is largely automatic. Some insight into this problem 
can be gained by examining the corresponding, but better understood problem of the automatic 
construction of an analysis phase. One common approach is the use of a parser generator [15]. A parser 
generator is a program which accepts as input a grammar for a source language and produces as output a 
set of tables which are used by a language-independent parsing algorithm. The parsing algorithm is 
supplemented by a set of action routines which are provided by the implementer; these action routines 
are called by the parsing algorithm at appropriate points to produce the output of the analysis phase. 
The important characteristics of this process are as follows: 


1. The analysis phase is divided into two parts, a language-independent part (the parsing algorithm) and 
a language-dependent part (the parsing tables and the action routines). 


2. The language-dependent tables are constructed automatically from a finite description of the language 
(the grammar). 


3. The analysis phase is “filled-in” by the implementer by providing information in a procedural form (the 
action routines). 


4. The choice of a specific parsing algorithm determines the class of languages which can be handled by 
the analysis phase. 


The process of constructing an analysis phase can be made ‘more automatic through the use of a compiler 
writing system. In a compiler writing system, the action routines are in a sense built-in; the implementer 
invokes these action routines from a higher-level description of the translation. The use of such a system 
may involve much less effort than would be required to write a complete set of action routines. However, 
the important point here is that the use of built-in knowledge, as opposed to allowing the addition of 
arbitrary procedural knowledge, restricts the class of translations (and thus source languages) which can 
be handled by the automatically generated analysis phase. 


For the compiler described in this. paper, techniques analogous to those described in the preceding 
paragraph are used in the implementation of the generation phase. The generation phase is split into two 
parts, a machine-independent part and a machine-dependent part. The machine-independent part of the 
generation phase is a machine-independent code generation algorithm, corresponding to the language- 
independent parsing algorithm of the analysis phase. Just as the choice of a particular parsing algorithm 
limits the class of languages that the analysis phase can handle (the parsing algorithm is not completely 
language-independent), the choice of a particular code generation algorithm determines the class of 
machines for which the compiler can produce reasonable (non-interpretive) code. The machine-dependent 
part of the generation phase consists of a set of tables produced automatically by a stand-alone program 
GT (Generate Tables) from a machine description, which corresponds to the grammar in the construction of 
an analysis phase. The information contained in the machine description may be supplemented by a set of 
routines which correspond to the action routines of the analysis phase. However, the compiler described 
in this paper is closer to the compiler writing system approach in that implementer-supplied routines form 
only a minor part of the generation phase. The extent to which the implementer can easily and safely 
include such routines in the generation phase represents another factor determining the class of target 
machines handled. 
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A code generation algorithm, if it is to be machine-independent, requires a mode! of a machine with which 
to work. This model may express such notions as memory, registers, . addressing, operations, and 
hardware data types.. In the machine description, the implementer defines his target machine in terms of 
this model and also specifies the form of the object language. The class. of: machines for which the code. 
generator can produce acceptable code directly corresponds to the generality of the machine model. 


The Machine model used by the C compiler is a C machine: a machine whose registers and memory are 
described in terms of the primitive C data types and whose operations are primitive C operations: The 
implementer models the target machine in terms of aC machine,.produging. an. abetract machine. The 
abstract machine may be very. similar.to.or very. different. from. the target. machine, depending. upon how 
closely the target machine fits the machine model. The code generation .sigorithm, using its machine 
model, produces code for the abstract machine. The “assembly” language of the ebstract machine is called 
the intermediate language; an intermediate language. program, which is in the form of a series of. macro 
calls, is translated into the target machine assembly language using a.set of macro definitions, provided by 
the implementer in the machine description. Assembly. pov of was chosen.cyer machine. language for 
the output of the compiler because it is far eesier to end. produce in s machine-indepandent - 
manner than machine code or object modules. 


The abstract C machine plays ‘the same role in the C compiler as would a Poole and Waite ebstract 
machine. The difference is that instead of there being one fixed abstract machine, there is a class of 
abstract machines, corresponding to the variability in the. machine model. This variability allows the 
implementer to define a particular abstract machine which more closely resembles his target machine. 
The result is that the transiation from the abstract machine language to the as ata machine language 
becomes simpler, and more efficient cade is produced. 


The process of modeling the target machine is described in chapter two. .A detailed discussion of the 
code generation algorithm is presented in chapter three. Conclusions are presented in chapter four. 
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2. Modeling the Target Machine 


The code generator’ 8 mode! of a machine is an abstract C faathiite, a ‘aethine whose instructions ‘perform 
the primitive operations of the C language. The deta types of the abstract machine are the primitive C 
data types 4 acters, integers, and single- and double-pracision floating point), supplemented. by one or 
more pointer ¢tesses which are distinguished by their ability to resolve addresses. The Getic addressable 
unit of the abstract machine memory is the byte, which holds a single character vaiue Hig renal one are the 
smattest.C data type). Values of the other abstract: machine dete types Gocdpy’ ah integrat number of 
bytes, possibly aligned in terger units of memory. The sbstract siscKine’ Nes a vet of régisters which may 
be used to hold the eperands.of the abstract: machine instructions. Each abstract’ machine register. is 
capable: of holding. vatues of some subset of the abstract machine-dete types. ‘The: instructions of the © 
abstract machine are three-address instructions. Each address may: specify 4 “machine-register. 
or a location in memory; the mechanisme fer. ata a reer Teton corespend to the priontve 
addressing modes in C. 


In the machine description, the implementer describes the tekst machine in fete of this machine model 
by defining a particular abstract machine for which the code generator prodwtes-intermediate code. The 
implementer specifies the sizes and alignments of the primitive C dete types and defines pointer classes 
as convenient. The implementer defines the ehstract machine edgistery;Nehich gensedlly correspond to 
those registers of the target machine which ere to: be weed iW the: evelaation ‘of expressions. The 


'. implementer alsa. specifies the registers which may hold veluse of:eac<of the. ebstrect machine ‘data 


types. In addition, the implementer may specify that any two abstract machine registers conflict in the 
target machine,. meaning: that only one nay holds value st:anjc one tim’ The Waplementer defines the 
abstract machine instructions in terms of their operand/result locations and possible side-effects on other 
registers. In addition, the implementer: provides a-cet —— —_ ae the abstract 
machine instructions on the target machine. 


2.1 The Intermediate Language 


The intermediate language ics the aesembly lenguage of the: abstract snachine: ‘Using the information 
contained in the tables constructed from the machine: description, the: code generator products a 
translation ef the source program in the. intermediate: language. An dntermediate tanguage program 
consists of a sequence of macro calls, each of which is expended into: one or woré: object language 
statements using the macro definitions provided in the machine deecription. There are two types of 
macres in the intermediste language: The first type-are ‘meerod-whten represent the three-sddress 
abstract machine instructions. The second type ere ‘keyword: macies Which ‘eorresportt to either 
assembly-language pseudo-operations or instructions implementing the primitive C control structures. 


2.1.1 Abstract Machine Instructions 


The abstract. ‘machine instructions are - three-sddress instructions. which: perform ‘the evaluation of C 
expressions. The operators of the abstract: mevhine ineltructione ere: bored abetrect mechine: Operators” 
(AMOPs), the addresses are called references: (REF). aoe 


2.1.1.1 AMOPs 


AMOPs are basic C operations which are qualified by the ‘specific abstract iechine dete types of their ~ 
Operande.: ee en ee oye 
operator °+’: 


+i integer: addition 

+d double-precision floating-point addition ‘i 
+p0 _— addition of an integer to a pointer to a byte-aligned object. 
+pl _—_ addition of an integer to a pointer to a word-sligned object 
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In addition, there are AMOPs for data movement, dete type conversion, and conditional jumps. AMOPs are 
represented in the compiler ss an integer opcode with « wehoe foe Ee, The various AMOPs are — 
listed in Appendix. IL 


2.11.2 REFs 


A REF is a C-oriented description of the location of an aperand or the result of an abstract machine 
instruction. A REF may specify either a register of the abstract machine or.e@ location in memory; the 
possible classes of memory references inciude C variables of verious storage ciesces (sutamatic, static, 
external, parameter, temporary). as welt as.constents and indirect references. A REF is represented by a 
pair of integers called REF.BAGE and. REF OFFSET; REF.BASE determines either « particular register or s 
particular class of ao reterences, REF OFFSET determines the exect iocation given a:specitic memory 
reference class. The possible values of REF.BASE are listed below with their interpretations (actual 
integer values are shown for concreteness; the compiler iteelf uses menitont Constante}: 


REF.BASE interpretation ; 
nz0 = register en (register rumbers are etsigned to the registers of the abstract 


machine in a predictable manner by GT)... 

-1 - an automatic or temporary varisble; OFFSET is the offeet of the variable in the 
stack frame Ss 

-2 - an external variable, referenced by nama; OFFSET ts an internal identifier 

. Namber- 

-3 - a static (internal) variable; OFFSET isan internal static veriable eumber 

~4 - a parameter; OFFSET is the offset of the variable or its acidress in the | 
argument list 

5 - a label; OFFSET is an internal label number 

-6 - an integer constent whose value is OFFSET 

-7 ~ a floating-point constant; OFFSET is an internal constant number 

-8 - 5 chanscboe clsieg caomdaal; SEFEE? can aalaraslatieeg amiiar 

ns~-9 - reference indirect through a pointer in -register:# (-n: ~9W OFFSET i the offest 


of the reference relative to the pointer 


The specific values of REF.BASE need not be referred to in most macro definitions; the exception ig the 
NAME macro, which converts a-REF into a symbolic address. 


The representation of a three-address instruction i in the intermediate language is that of a macro call with 
five or seven integer arguments representing the AMOP and:REFs. for the result.aud the.operanda of the 
AMOP. (Each REF consists of two arguments, REF.BASE and REF.OFFSET; only two REFs are provided in 
‘the case of a unary AMOP.) The macro name.ssed in: the.magro.call is of. a special. farm whick spaeities.en 
entry in a table produced from the machine. description by the. GT. pragramthis teble entry refers to the 
representation of the corresponding mecro definition from the machine. dascription.. 


2.1.2 Keyword Macros 


Keyword macros are those macro calls which, along with the three-address instructions, make up an 

intermediate language program. Unlike. AMOP macros whose names are generated by GT, the names. of 

the keyword macros are predefined, as are their functions. For example, keyword macros are used to 

define external variable names and internal labels; {0 specify initial values in storage, and to produce the 

function protogs and epilogs. The various keyword macros defined in. the intermedieteianguage are listed 

peaslireg 4 with a brief description of their brains 4 Wore complete ent ot. descriptions .eppeers in 
pendix 


ae «Jeet 


macro funetion 


HEAD produce header statements, if needed 
ENTRY define an.entry point 

EXTRN = define an external reference 

IWT define an integer constsnt 

CHAR define # character constant 

FLOAT define a floating-point constant 

NFLOAT define a negative floating-point constant 
DOUBLE define a double-precision float constant 
NOOUBLE define a negative double-precision constant 
ADCONn — dafine s.ciaes."n" pointer conetent © - 
STRCON defines pointer referencing « string vestnuiaia 
EQU define.a symbol. 

ZERO _ define an area:of etorage intilized: 0 2000 
STATIC . define a-static variable. 

STRING define the string constants 

ALIGN |. - force en.sligneent of the ‘acetion: counter 
LN define a line-number symbol 

LABCON define a label constant 

LABOEF define an internal label 


ION translate an internal identifier number 
into the correspending assembler symbol” 
~ END produce an end statement, if needed 


- , PROLOG produce the prolog code of a C function 
EPILOG — produce the epilog :codeof aC function . 
CALL produce a function call 
RETURN produce code for a return statement 
GOTO produce a jump to a label expression 
LSWITCH _ produce-e switch: jump (list: version) 
TSWITCH produce e switch jump (table version) © 


The actual macro names which appear in an intermediate language program ere abbreviations of the 
names listed above. 


2.2 The Machine Description 


The machine description is a “program” written in a special-purpose language from which is constructed 
the machine-dependent tables of the generation phase. The machine description has two functions: (1) it 
defines the particular abstract machine for which the code genéfator prodticde thtermediate code, and (2) 
it specifies the translation from an intermediate language Program to the corresponding object language 
program. 


The abstract machine is ‘defined in two sections of the machine duectigticn. First, a set of definition 
statements defines the registers and memory of the abstract. machine. _in the OPLOC section, the 
AMOPs: are defined ‘in terms of their operand/result locations. The travdlafion trom the intermediate © 
languege to the object tanguage is specified by « set of macro-définitione In the macro section of the 
machine description. More information on the writing U's mechite UseGFPtiOn may be found in Appendix 
I; the machine description used in the HIS-6000 implementation is listed in Appendix. IV. 


2.2.1 Defining the Abstract Machine 


In the iniching description, the implementer first defines the registers’ ot the abstract machine. For 
example, the stetement 
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regnames (x0,x 1,x2,x3,x4,a,q,f); 


defines the eight abstract machine registers used in the HIS-6000 implementation. The registers XO 
through X4 correspond to the first five of eight HIS-6000 index registers, the A and Q correspond to the 
accumulators, and the F register is a fictitious floating-point accumulator which corresponds to the 
combined A, Q, and E (exponent) registers on the HIS-6000. . :-Tae-fact. that: the F: register contiicts ‘in the 
target machine with the A and Q registers is specified by the statement 


conflict (a,f),(q,f); 


The remaining HIS-6000 index registers are not represented in the abstract machine since it was not - 
desired that they be used by the code Generator in the: evaluation of expressions; two of those registers 

hold “environment pointers,” the other is used as a scratch register: by some of the macro definitions. 

There is nothing that requires that the abstract machine segicters be Wnplemented es actuel machine 

registers on the target machine; they may also be inpiaoaries oe fined mawrory tocetions. 


For convenience, the abstract machine registers cen be gsthered into cipeeees for example, in the HIS- 
6000 implementation, the statement 


class x(x0,x1,x2,x3,x4), r(a,q)s 
defines the class of index registers X and the class of general registers R. 
The implementer also defines the classes of abstract machine pointers. Pointer classes are necessary on 
machines which are not byte-addressed since pointers to byte-sligned objects wilt be handled differently 
' than pointers to word-aligned objects. In the HiS-6000 machine:description, the statement 

pointer pO(1), p1(4); 
defines the class PO of byte pointers and the class Pl of word pointers. The "4" indicates that: the vaiue 
of a Pl pointer is always a multiple of four bytes. The tank thet: theve-ere four tytee pir WOrd on the 
HIS-6000 is specified in the statement 

size 1(char), 4int,flost), 8(double); 
A similar statement is used to specify the alignment restrictions. 
The statement 

type int(r), char(r), float(f), deuble(t), pO(r), p10; 
defines the registers which can hold values of each of the sbstract machine data types. For. example, in 


the HIS-6000 implementation, word pointers are heid in the index ree x —_ byte pointers are held 
in the general one R. 


The definition of. the- abstract machine is completed in the OPLOC section of the machine descriplion 
where. the implementer. specifies the, behavior of the. abstract machine operations: in torms of their 
Operand/result locations. For example, the location definition 


+d: EM 


specifies that the AMOP °+d” (doubie-precision floating-point addition) can take its first operand in the F 
register and its second operand in any memory location and, under these circumstances, the result is 
placed in the F register. The construct on the right in the location definition is called an -OPLOG it 
consists of three location expressions, one for the first operand, second operand, end result (reading from 
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left to right). A location expression may specify any set of abstract machine registers or any set of 
memory reference classes; for example, the location expression 


r|x 


represents the set consisting of the general registers R and the index registers X, and the location 
expression 


~ intlit 


represents the set consisting of all memory reference classes except that of integer constants. An OPLOC 
may specify that the result is placed in the first or second operand location. For example, the location 
definition 


+i: r,M,1; 


specifies that the AMOP °+i’ (integer addition) takes its first operand in a general register and its second 
Operand in any memory location, and the result is placed in the register which contained the first 
operand. This location definition is equivalent to 


+i: a,M,a; @,M,q; 


which explicitly lists the two alternatives. An OPLOC may also specify that the contents of certain 
registers are destroyed during the execution of an AMOP; for example, the location definition 


ti: q,M,q {a} 
specifies that an integer multiplication destroys the contents of the A register. 
2.2.2 Defining the Object Language. 


The translation from the intermediate language to the object language is specified by a set of macro 
definitions included in the machine description; macro definitions are provided for the abstract machine 
instructions and the keyword macros. The simplest form of a macro definition is a single character string 
which is substituted for the macro call during macro expansion. For sxenere, the macro definition for 
floating-point unary minus used in the HIS-6000 implementation is 


“ud: i FNEG” 


This macro definition specifies that each occurrence of a ’-ud’ abstract machine instruction is to be 
translated into the assembly language instruction "FNEG" which complements the contents of the F 
register. The macro definition for °-ud’ is closely related to the jocation definition for ’-ud", . 


-ud: fyls 


which states that the operand is found in the F register and that the result is placed in the F register. A 
-macro definition for an AMOP can assume that the actual operand/result locations appearing in an 
abstract machine instruction satisfy the constraints specified in the corresponding location definition; at 
the same time, a macro definition must produce correct code for ail combinations of operand/result 
locations allowed by the location definition. 


A macro definition for an abstract machine instruction can refer to symbolic representations of the 
Operation and the operand/result locations by using the character sequences #0 (operation), #F (first 
operand), #S (second operand), and #R (result). These character sequences are abbreviations for calls to 
an implementer-defined macro which converts an AMOP opcode or a REF into the desired object language 
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representation. For aaah da the macro definition for °+ Grteger addition) in the HIS-6000 
implementation is 

+is 28 ADaR «#s” 


If the first operand location (which is also the result location) is the A register and the second operand is 
an external variable "X", then the code produced by this macro definition is 

ADA x 
which adds the contents of "X" to the A register. A macro definition can also contain cheracter strings 


whose Inclusion in the expansion of a mecro call is conditional upon the locations of the operands and/or 
result. An example is the HIS-G000 macro definition for *<< (eft ehitt) 


<<: 


Gintlit,): . #FLS . %o(#"S)" 


(,wintlit,): ” LXL5 aS 
@FLS 05" 


which produces different code sequences depending upon whether or not. the second operand (the 
number of bit-positions to shift) is en integer constant. A macro definition may include references to the 
arguments of the macro call using the character sequences 00, @1, .. 09; « macro definition may. include 
embedded macro calls, Hen ete aere re Oe te aoe et ee eee oe the integer 
constant. 


A macro definition may also be specified in the form of a C routine. C routine macro definitions are used 
when processing is needed which is beyond the capabilities of the simple macro scheme so far deecribed.. 
C routine macro definitions may define global veriables, perform erithmetic and logical operations, and 
select code sequences on conditions other than operand locations. In the presant implementstion, — 
however, C routine mecro definitions are unable to interact. with the code generation algorithm. In the 
HIS-6000 implementation, C routine macro definitions are used to. transiste REFs. into GMAP symbols, to 
translate the source language representations of identifiacs and tlosting-point constants into GMAP, to 
define character string constants, end to buffer characters while defining storege for variables (GMAP 
does not have a byte location counter, as is. seaumed in the intermediate Jenguage). The C routine macro 
definitions used in the HIS-6000 implementation are listed in Appendix vo: 
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3. Generating Code for an Abstract Machine 


The most interesting part of the compiler is the. code generator since, unlike most code generators which 
produce code for a fixed target language, the code generator of the C compiler is designed to produce 
code for a class of abstract machines. 


8.1 Functions of the Code Generator 


The code generation process consists of three fairly distinct functions. First, there is the generation of — 
intermediate language statements to define and initialize static data areas and constants. Second, there is 
the translation of source language contro! structures into labels and branches. Third, there is the 
translation of source language expressions into sequences of abstract machine operations. 


The C compiler is designed to produce assembly language code for conventional machines; thus, the 
intermediate language statements for defining and initializing static data areas directly correspond to 
assembly language statements which define symbols, define constants, and align the location counter. The 
only complication is that the code generator must use the size and alignment information from the machine 
description in order to specify the sizes and alignments of data areas. More information and redundancy 
could be added to the intermediate language in order to accomodate a larger class of target languages; 
see [16] for examples. Another possible improvement would be to emit segment specifying instructions 
so that the output could be segregated into different segments according to whether it is code, pure data, 
impure data, or uninitialized data. ; 


The process of translating source language control structures into labels and branches is rather 
straightfoward. The only complications come when emitting conditional branches which test the value of 
an expression; these problems are covered in the next section. 


3.2 Generating Code for Expressions 


The generation of code for expressions is the most difficult part of the problem. The code generator 
must generate a correct sequence of abstract machine instructions to carry out the indicated operations. 
The operand and result locations it specifies in the abstract machine instructions must conform to the 
location definitions provided in the machine description. Moreover, the code generator must keep track of | 
the locations of all intermediate results and correctly administer the abstract machine registers and 
temporary locations. 


The generation of code for expressions is pericimes in two steps, semantic interpretation and code 
generation. 


3.2.1 Semantic Interpretation 


The code generator receives expressions in the form of syntax trees whose interior nodes are source 
language operators and whose leaf nodes are identifiers and constants. Thus, an expression can be 
considered to consist of a “top-level” operator along with zero or more operand expressions. The first 
step in the processing of an expression consists of translating a tree in this form to a more descriptive 
form whose interior nodes are AMOPs. This translation involves checking the data types of operands, 
inserting conversion operators where necessary, and choosing the appropriate AMOPs to express the 
semantics of the source language operators. The selection of an AMOP to replace a source language 
operator is based primarily on the data types of the operands. For example, on this basis, an addition 
Operator may be translated into either integer addition, double-precision floating-point addition, or one of 
a number of pointer addition AMOPs. However, it is useful to be able to choose AMOPs also on the basis 
of what is provided in the machine description. The basic idea is that of defaults. If the semantics of a 
particular AMOP can be expressed in terms of a composition of more basic AMOPs, then the AMOP can be 
left undefined in the machine description; the code generator can use the equivalent composition of 
AMOPs instead. The advantage of having optional AMOPs is that the implementer need define one of 
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these optional AMOPs in the machine description only if his definition will result in sufficiently better code 
than will be produced using the equivalent composition of more basic AMOPs. 


An example of this technique is the handling of a class of C operators called assignment operators. An 


example of an assignment operator is ‘=+', where “L =+ R” is defined to be the same as “L = L + R” except . 
that the expression L is evaluated onty once (it may contain. side-effects). Consider an expression: 


“L =op R." If the corresponding. abstract machine assignment operator is defined in the machine 
description, then the source: language operator is trensiated into thet abstract machine 
operator; otherwise, the expression “L -op R” is converted to the equivalent: form “L = 1 op R”, except 
that there is only one copy of “CL having two fied ts to it (a flag is set in the root node-of “L" so that 
-later routines will recognize this fact). Theretore, a particular ‘ebstract. machine assignment. Operator need 
be included in the machine description only if the code sequences it generates are better than the code 
that would be generated by the equivalent assignment expression. An example. from the. HiS-6000 
implementation is the abstract machine operator ‘=i’ Aintager.. addition-assignment) which is. transteted 
into an add-to-storage: instruction... The -conréape : senignment.operetor *=+d" is not 
defined in the machine description since no “floating-point add-to-starege. inetruction exists on the 
machine. 


Other examples of optional AMOPs which have been implemented are the pointer comparison operators 
for pointers other than class. PO pointers (the default is to convert to the “greatest comman denominator” 
pointer class for which the operation is implemented) and the test for null/non-null pointer. apereters (the 
default is to convert the pointer to an integer and test for equality /inequality with 0). Other — 
candidates for being optional AMOPs ‘are-the various. increment at Grace NAN: 


3.2.2 Code Generation 


The second step in the processing of an expression is the generation of 3 sequence of abstract.machine 
instructions to carry out the evaluetion of the expression. This code generation is performed by a set of 
recursive routines, some of which will be. described in this.section, The:operation.of the code. generation 
routines is basically top-down. When a call is: made to generate code to- evaluate an expression, e set of 
desired locations for the result of that evaluation is also specified. This. specitication, along: with other 
available information about the operands. of the: top-level: operator of the expressian,:is. used-to-choose 
one of the OPLOCs from the top-level operator's location. dafinition.in the machine _duscriptien {location 
definitions are described in section 2.2.1). From the chosen OPLOC and, possibly, the desired-locations for 
the result of the expression are derived sets of desired locations for the operands of the top-level 
operator. Recursive calls are then made to generate cade -to. evaluate the operands -into: these desired 


locations. Next, an abstract machine instruction is emitted for the top-level operation. Finally,. if. 


necessary, abstract machine instructions ere emitted to move the result of. ox ee, to an 
acceptable location. 


3.2.2.1 Specifying Desired Locations 


A set of desired result locations i is specified by a structure called a LOC. A Loc structure has two integer 


members, LOC.FLAG and LOC.WORD. The possible values of LOC.FLAG ere ee below slong. with their 
interpretations: 
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LOC.FLAG interpretation 


0 the “result” is the internal label specified by LOC.WORD (used. only for 
conditional jump AMOPs) 


1 the result is to be placed in a register; acceptable registers are specified by 
one-bits in LOC.WORD (bit 0 corresponds to register number 0, etc.) 


2 the result is to be placed in memory; acceptable classes of memory references 
are specified by one-bits in LOC.WORD (this field is used only to select registers 
for pointers in indirect references) 


3 the result may be left in any location acceptable for values of the particular 
datatype — 


Note that a particular memory location:is never specified as the desired location for a result; rather, 
classes of possible memory locations are specified. . 


For convenience, if the LOC passed to the top-level code generation routine specifies that the result is 
desired in a register, then all registers not capable of containing the particular data type of the 
expression being evaluated (as defined in the TYPE statement of the machine description) are removed 
from the LOC. Similarly, if the LOC specifies memory reference classes, then all indirect classes where the 
pointer register is unable to hold pointers of the corresponding pointer class (as specified by the TYPE 
statement) are removed from the LOC. Thus where the code generator simply desires that a value be in a 
register, it may provide a LOC specifying that the result may be left in any register. 


The removal of “impossible” registers from a LOC is not performed when such an action would leave no 
remaining acceptable registers; this situation can actually occur in certain special cases, such as return 
statements, where an operation requires a value in a register not normally used to hold values of that 
type. 


3.2.2.2 TTEXPR 


The top-level code generation routine is TTEXPR. The function of TTEXPR is to generate a sequence of 
abstract machine instructions which will evaluate a given expression and leave the result in an acceptable 
location, as specified by a LOC parameter. The operation of TTEXPR begins with the removal of 
impossible cases from the LOC parameter, as described above. Then, TTEXPR passes the expression and 
LOC parameters to a routine CGEXPR, which generates abstract machine instructions to evaluate the 
expression, using the LOC parameter as a non-binding indication of preference. Finally, TTEXPR calls the 
routine CGMOVE to emit, if necessary, abstract machine instructions to move the result to an acceptable 
location. 


(3.2.2.3 CGEXPR 


The function of CGEXPR is to generate a sequence of abstract machine instructions which will evaluate a 
given expression. CGEXPR is given a LOC argument which specifies preferred locations for the result of 
the expression; however, unlike TTEXPR, this specification is non-binding and is used only where a choice 
exists. : , 


The operation of CGEXPR consists basically of testing for a set of special cases and then performing the 
appropriate action, which is usually to call another routine which does the real work. The first special 
case is where the expression node is shared and the expression has already been evaluated; in this case, 
no action need be taken. Another special case is where the top-level operator is a conditional AMOP and 
a value is desired (as opposed to a jump, which is the usual case); in this case, a routine JUMPVAL is 
called to emit the desired code. The other special cases involve particular top-level operators: 
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indirection, assignment, conditional expression, function call, and the “leaves” of the expression tree, 
identifiers and literals; in these caees, the code generation routine corresponding to the particular top- 
level operator is called. Finally, in all other cases, the routine CGOP is called to emit code to evaluate the 
expression. 


3.2.2.4 CGOP 


The function of CGOP is to emit code to evaluate an expression whose top-level operator is not one 
special-cased by CGEXPR. Like CGEXPR, CGOP is passed a LOC indicating non-binding preferences for the 
location of the result of the expression. 


The operation of CGOP is performed in six steps. First, a routine CHOOSE is called to select an OPLOC 
from the top-level operator’s location definition in the machine description. Second, desired locations for 
the operands of the top-level operator are determined. Third, a routine EXPR2 is called which makes 
recursive calls on TTEXPR to emit code to evaluate the operands into the desired locations. Fourth, code 
is emitted to save any registers which are specified in the machine description to be clobbered by the 
execution of the top-level operator. Fifth, the exact location of the result of the expression is 
determined. Sixth, the actual abstract machine instruction for the top-level operator is emitted. : 
If the result location specified by the LOC parameter is a label, or if the selected OPLOC specifies that the 
result is left in the first or second operand location, then the exact location of the result of the 
expression is fixed. Otherwise, a particular register must be chosen from the set of registers specified in 
the result field of the OPLOC (the compiler is currently unable to handle OPLOCs which specify a set of 
memory references as the location of the result). In the search for a result register, the priorities are as 
follows: first, free registers which are preferred result locations; second, busy registers which are 
preferred result locations; third, free registers which are not preferred result locations; and fourth, busy 
registers which are not preferred result locations. If a busy register is selected, register contents are 
saved in temporary locations as necessary. 


For the purposes of finding a result register, a register containing an operand is considered free and a 
register containing a pointer to an operand is given lowest priority. A register containing a pointer to an 
operand is protected because the implementation of a AMOP may alter the contents of the result register 
before the operand referenced by the pointer in that register is used. An example is the following HIS- 
6000 code for the AMOP *+p1’ (addition of an integer to a pointer to a word-aligned object): 


LXLO I 
ADLXO P 


This code loads index register O with the integer I and then adds to register 0 the pointer P. (The code 
for the AMOP includes the load instruction since in general integers cannot be stored in the HIS-6000 
index registers as they are only halfword registers.) If the code generated for P leaves P referenced 
through index register 0, the load instruction will “clobber” register 0 before P is accessed by the add 
instruction: 


LXLO I 

ADLXO 0,0 
However, if index register O is protected, index register 1 will be chosen instead to hold the result, 
producing the following correct code: . 

LXL1 I 


ADLX1 0,0 
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3.2.2.5 Selecting an OPLOC 


The purpose of OPLOC selection is to select a set of operand/result locations for the top-level operator 
of an expression by choosing one of the OPLOCs fram the location definition of the operator in the 
machine description. The choice of operand/result locations will affect the amount of code produced to 
evaluate the expression, both because of different code sequences which may be produced by the macro 
definition for the operator and because of additional loading, storing, and saving operations which may be 
required in order to set up the operands and move the result to an acceptable location. A general 
solution, taking into account all possible locations of operands and results, is a complex optimization 
problem. Instead, a more limited approach has been taken which uses the provided preferences for 
result locations and available information about the possible result locations of the top-level operators in 
the operand subexpressions. For example, if an operand is an identifier, then its location is known to be 
a memory reference of a particular class. Similarly, various operators may be defined in the machine 
description to always place their result in one of a particular set of registers. Using information of this 
sort, plus knowledge about the current register usage, a rough estimate can be made of the number of 
additional load and store instructions which will be required for each OPLOC in the location definition; 
from the set of OPLOCs, the one with the lowest additional cost is chosen. 


For example, consider the expression “I + (J / K).” (For clarity, source language operator symbols are 
used in this example to represent the corresponding integer abstract machine operations.) Assume the 
following location definitions (the OPLOCs are numbered for future reference): 


+ rv]; (1) 
r,M,1; (2) 

M,r,2; (3) 

I: . tly,l [r2 (4) 
Bre : r2,r,1 [r3} (5) 
r3,r,1 (r4} (6) 

r1,M,1 (r2} (7) 

r2,M1 [r3} (8) 

r3,M,1 (r4} (9) 


- Here M represents all memory reference classes and r represents a set of general registers consisting of 
rl, r2, r3, and r4. The division operator is modeling a machine instruction which produces pairs of results 
(the quotient and remainder) in adjacent registers. For the division abstract machine operator, only the 
quotient is used; the other register is considered to be “clobbered” by the execution of the operator. 
Note that one can deduce from these location definitions that both operators always leave their results in 
general registers. 


The generation of code for the expression “I + (J / K)" begins with the selection of an OPLOC from the 
location definition of the °+’ operator. In this case, all of the OPLOCs specify the same set of result 
locations (the general registers); thus, the desired locations for the result of the expression does not 
affect the choice of OPLOCs. Instead, the choice is made on the basis of the possible locations for the 
Operands. In this case, the first operand is a variable I which is known to be a memory reference of a 
particular class. The second operand is the result of a division operator which is known to leave its 
results in either ri, r2, or r3. On this basis, OPLOC (3) is chosen because no extra operations are needed 
to move the operands into acceptable locations, whereas both OPLOCs (1) and (2) do require such extra 
operations. 


Next, a recursive call is made to generate code to evaluate the subexpression "J /K." The desired 
locations for the result of this expression are those specified by the chosen °+’ OPLOC for its second 
Operand, namely r, the set of general registers. However, since the ’+’ OPLOC specifies that the second 
Operand location is also the location of the result of the ’+’ operator, the intersection of that location set 
with the set of desired locations for the result of the '+’ operator is used instead, if that intersection is 
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non-null. Thus, the following factors are used in selecting an OPLOC for the '/ operator: first, which of 
the possible result registers (rl, r2, 13) are desired Fao locations; second, which. of the possible result 
registers are free; and third, which of the * Fpgisters (52, .13,.c4) are free. in this particular 
situation, the possible location of the first operand. Wiss memory rafecence, and. thus. doss.net favor any 
of the OPLOCs. beware, he sapceel saperaik; ANG 6 ane ia: to No.0 emorySaterantay Arp 
OPLOCs (7), (8), and (9). 


In addition, when selecting « en OPLOC from a location detinitian, certain OPLOCs may be rejected entirely 
because they specify conditions which can not be met. For example, if an OPLOC specifies (either directly 
or indirectly through an operand iecation) that the result is left. in a. register, but. the reeult is desired. in 
memory, then that OPLOC will be rejected if a temporary, location is not-ecceptable. The OPLOC is 
rejected because, given a value in a register, the only h pega methed by. which the code generator can 
make that value into a memory reference is by saving it in newly allocated temporer (Recal 
that a specific memory ‘location is not provided for the result, Only a sat.of acceptable memory reference 
classes.) Similarly, if the result will be in pend f ard is desired. inn ¥>.then, that OPLOC. will be. . 
rejected if there are one or more poss _memory reference. classes which ere not acceptable 
result locations; this is done because the os generator is not capable of transforming a memory 
reference from one class to another. Similar checking.is performed.on the operand.tocation specifications 
in the OPLOC: if an operand is requirad by the OPLOC ta be in memory but not. sll’ non-indirect memory 
reference classes are allowed, then that OPLOC will be rejected if. the opereed operator is rot guaranteed 
to place its result in an acceptable memory focation or if it cen place its result in a register but 
temporary locations are not accaptabie. These restrictions silow a location definition to contain extra | 
OPLOCs which apply only in specie! cases since such OPLOCs will aver Be roee aloes: ina: speee 

Cases hold. 


An example of how the OPLOC selection method can be utilized in the writing of a. machine description is 
the following definition of the ’+p1’ AMOP (addition of a integer to a pointer to. ¢. word-sligned object) 
taken from a hypothetical HIS-6000 machine description (the described. > selection. 

implemented at the time the actual HIS-6000 machine description wes with). "The shortest code for 
executing the °+p1° operation in the general.case is 


LXLO I 
ADLXO P 


where I is the integer in the low-order half of a word in memory and P’is the pointer in the high-order — 
half of a word in memory. "Toe geu OF Ue aperation ip iain ae vce gee Kes te OPLOC for this 
code sequence is 


MM,x; 


However, if both the integer and the pointer must be computed into registers (which occurs frequently in 
referencing elements of an array), the integer and the pointer must first be stored into temporary 
locations before this code sequence can be applied, Therefore, using the given eee. sequence under 
these circumstances results in excessive — code. 7 desired code is 


AS ss 18 
STA TEMP 
ADLXO TEMP 


which shifts the integer in the general register into the high-order halfward, stores it into a temporary 
location, and adds it to the pointer in the index register. The OPLOC for this this code sequence is 


xr, 1; 
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In the case where the pointer is in an index register and the integer is a constant “n", then the desired 
code is 


EAXO nO 
with an OPLOC of 
x,intlit, 1; 


The described OPLOC selection method allows all three OPLOCs to be included in the location definition for 
*+pl". In particular, it guarantees that the third OPLOC will never be selected unless the second operand 
is an integer constant. : 


3.2.2.6 Generating Code for Subexpressions 


After an OPLOC has been selected, CGOP calls a routine EXPR2. to make recursive calls on TTEXPR to 
generate code to evaluate the operands of the top-level abstract machine operator. The LOC arguments 
passed to TTEXPR in these calls are taken from the operand fields of the selected OPLOC and, in the case 
of Operators which place their result in an operand location, the desired locations for the result of the 
top-level operator. If there are two operands, EXPR2 makes sure that the two operands will not require 
the use of the same register (for example, by using a register to hold both one operand and a pointer to 
the other operand); this is done by checking the LOCs for “overlap” and removing certain possibilities. In 
addition, EXPR2 evaluates first the operand which is more complicated on the basis of the sizes of the 
subtrees for the two operands; this tends to reduce the number of saving and restoring operations 
performed. In the course of generating code to evaluate an operand of a binary abstract machine 
Operator, it may be necessary to use the register containing the already computed value of the other 
Operand or a pointer used to reference it, in which case code is generated to save the contents of this 
register in a temporary location. Thus, after generating code to evaluate both operands, EXPR2 calls a 
routine RESTORE to generate code, if necessary, to restore the saved value to its original register. 


3.2.2.7 Register Management 


The status of the various abstract machine registers with regard to register allocation is contained in an 
array of structures called REGTAB. Each element structure of the array represents the current state of 
One abstract machine register. An element structure consists of two members: UCODE, an integer 
indicating the current use of the register, and REP, a pointer to the subexpression tree whose value is 
currently in the register. The possible values of UCODE are listed below with their interpretations: 


UCODE Interpretation 
0 the register is free 
-1 the register contains the value of the expression pointed to by REP 
~2 the register has been marked "do not use unless necessary” for the purpose of 
finding a register for the result of an AMOP; although the register contains a pointer 
to one of the operands of the AMOP, it is free in that it may be selected as a last 


resort without having to save its contents. 


n>0 the register does not directly contain a value, but there are "n" conflicting registers 
containing values which must be saved before this register can be used. 


The routines used in register management are described below: 
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CLEAR(R) - Register R, which must directly contain the value of an expression, is made 


available for use; its current value is not saved. 
ECLEAR(E) - The register associated with the expression €, if any, is CLEARed. 
FREEREG(W) - A register from the set specified by W is made evsilable for use; the 


contents of registers are saved if necessary. 

GETREG(W1,W2) - If possible, an unmarked register from the set W1 is made evailable for 
use. Otherwise, if possible, an unmarked register from the set W2 is made 
available for use. Otherwise, a marked register from the set W1 is made 
aveilable-for.use. Within sech-set, free registers ere ‘chosen in preference 
to busy registers; if e.busy register is chesen, its contents.ere saved. 


MARK(E) - If the expression E is an indirect reference, the register containing the 
; pointer is merked “do not use unless necessary.” 

NBUSY(W) - Return the number of busy registers. in the set W. 

NFREE(W) - Return the number of free registers in the set W. 

RESERVE(R,E) ~ Register R is allocated to. hold the value of the expreesion E ‘Register R 
must be available for use. 

RESTORE(E) - If the ualue-of the expression E (or 4 pointer in the esse-of en indirect 
reference) hes ‘been sound ina temporary ination, ite vostount te the 
original register. 

SAVE(R) - Ragicter R is mede available for wee by seving the contents of whatever 
registers .are necessary. 

UNMARK(E) ~ Undo a MARK, 


The following is a typical series of cails made by CGOP in the generation of code for saioie drat 
whose top-level operator is abinary operator with operands OPi ead OF2s. 


OPLOC=CHOOSELE,LOC). choose an OPLOC 


EXPR2(OP1,0P2) recursively generate code to evaluate 
the operands into acceptable focations 

ECLEAR(OP 1) . wake-operend registers. available for 

ECLEAR(OP2) _ the result 

SAVE(s) seve “clobbered” registers, if sny 

MARK(OP 1) mark registers used to hold pointers 

MARK(OP2) to operands 

ReGETREG(s,*) -_- select a result register 

UNMARK(OP 1) unmark any marked registers 

UNMARK(OP2) 

RESERVE(R,E) reserve result register 


3.2.2.8 Possibilities for Failure 


The code generator can fail in two ways: (1) it can reach an impossible situation and announce a compiler 
error, and (2) it can unknowingly generate incorrect code. Examples of impossible situations are (1) 
discovering that there are no acceptable OPLOCs in the location definition for an operator, (2) being told 
that the result must be placed in a register from the empty set of registers, and (3) discovering that an 
essential location definition or macro definition of an abstract machine operstor was not provided by the 
implementer. The most likely cause of a failure is an incorrect mechine description. Exemples of errors — 


which can be made in the machine description are (1) an OPLOC iin thet ‘both operends must be in 
the same register, (2) an OPLOC specifying a set of memory reference cleeses for the result location, (3) a 
macro definition containing errors, and (4) a macro definition which. does pot. policpale 8 particular 
operand or result locatién, or combination thereof, allowid by the ‘location ‘def nition or otherwise 
essential (in: the case of move operations which must be Capable Of bioving ‘eming tegistecs and be}ween 
registers and memory). Some of these errors could be detected by’ the pr fam which processes the 
machine description (GT). Another possible cause of falluré is an sbatract: machine with en insufficient 
number of registers: Such a machine may require that’s register be used te” ‘hold ‘both @ pointer to an 
operand and the result of an operation; as described sbove,: this situation may Fésult in incorrect code. 
Hopefully, abstract machine models of real machines will not suffer from this pro Of course, the. 
other possible cause of failure is a bug in the code generator itself. It would be | resting ‘and useful if 
such a code generation algorithm could be proven correct, given sensible restrictions on the machine 
description and the assumption of correct macro definitions. 
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4. Conclusions 


This paper has described the implementation of a portable compiler for the programming language C. The 
compiler was first implemented by the author in a seven. month periad.on the Bell Laboratories: Computer 
Science Research Center's POF-11 745 UNIX system. The.compiier was.then yged.to compile itself, and. the 
resulting code moved to the HIS-6000. Another. ms was a ere compiler. until. the 


version of the compiler coated: on the EO jtuelt. | This -was renmetot as a 
significant test of the compiler. 


4.1 The Compiler 


The major problem with the compiler itself is its speed, The compiler appears to be more than twice as 
slow as other compilers for similar source languages. This slowness is due almost entirely to the use of a 
macro expansion phase (a phese not likely to be present in ordinary compilers), since the compiler tends 
to spend half or more of its time in. the macro expension phase. The slowness of the compiler seems to 
_be a problem inherent in the chosen compiler structure; no amount of mere recoding is likely to 
significantly reduce the percentege of time spent in the macro expansion phase. One approach toward 
improving the speed of the compiler would be to eliminate non-essential processing such as the 
construction and interpretation of character-string representations of macro calls and the rescanning of 
macro definitions. The macro language could be modified so that the result of the expansion of a macro 
call would never be needed as an argument to another macro cali and thus could be printed directly, 
- rather than returned as a string and rescanned. Given this restriction, the macro definitions could be 
compiled into procedures which simply print strings end call other procedures. These procedures could 
be called directly by the code generator; siternatively, they could be called by a procedure which 
interprets a suitable encoding of the intermediate language. 


A second problem with the compiler is its size, in terms of both the amount of file space necessary to 
support an implementation. of the compiler and the amount of memory required to execute the compiler 
phases. The source of the compiler is about 250K characters, the source of GT is about 80K characters; 
thus, the file space required for source, object Hbraries, and executable files is on the order of IM 
characters. Only the size of the code of the code generator is a result of designing the compiler to be 
portable; it is likely that a code generator designed for a specific machine would be much smaller. Other 
reasons for the large size of the compiler stem from the particular programming techniques used. In 
Particular, keeping the entire tree representation of a function in core at one time during code generation 
requires that a large block of storage be reserved. Also, the use of a bottom-up. table-driven LALR(1) 
_parser seems to result in a larger syntax analysis phese than would result from using recursive descent, 
as does the UNIX C compiler. The large size of the compiler limits the number. of computer systems which 
can support the compiler. 


Despite these problems, it is believed that were one prepared to make the investment necessary to 
implement C on another machine, the size difficulties and related costs would be outweighed by the 
relative speed with which one could bring up a working implementation. One could then concentrate on 
making it more efficient, having the advantages of a C compiler to work with and the ability to program in 
C. 


The least flexible machine-dependent component of the compiler is the code generation algorithm. It is 
acknowledged that a.clean mechanism for allowing the implementer to tailor the code generation algorithm 
through the addition of procedural knowledge would be an improvement. On the other hand, clinging to 
the idea that the code of the compiler will never be. touched is unrealistic. A likely prospect for 
modification is the code related to the calling sequence since it may be desired to use a system standard 
calling sequence instead of the one built into the compiler. Another problem which would be solved most 
easily by modifying the code generator is the IBM S$/360 addressing problem. Because a S/360 | 
instruction cannot contain an arbitrary memory address, C external variables must be referenced by first 
loading a register with a pointer te the variable (en address constant) and then using the register as a 
base register in the actual instruction. These actions could be performed by the macro definitions using 
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conditional expansion; however, it would be easier to modify the code generator to handle this particular 
case. 


The most direct method of moving a portable compiler based on a machine description requires access to 
an existing implementation of the compiler. The process of moving a compiler written in its own language 
from machine A to machine 8 is as follows: First, one writes a machine description for machine B. 
Second, the machine description is used by a construction program running on machine A to produce a 
new compiler which produces code for machine B. Third, the compiler on machine A is used to compile 
the new compiler, producing a compiler which runs on machine A but produces code for machine B. 
Fourth, the new compiler is used to compile itself, producing a compiler which runs on machine B and 
produces code for machine B. This process is called a half bootstrap. On the other hand, the Poole and 
Waite approach does not require the use of an existing implementation. One need write only an 
interpreter or a translator for a very simple abstract machine language in order to move a program to a 
new machine. This technique is called a full bootstrap. In practice, the need for a half bootstrap often 
represents a significant obstacle to moving a program. 


The full bootstrap method can be used to move a portable compiler based on a machine description as 
follows: Initially, a simple imaginary machine is defined as a vehicle for boatstrapping. A compiler which 
runs on and produces code for this imaginary machine is then constructed using the half bootstrap 
method described above. Now, in order to move the compiler to a new machine, one implements an 
interpreter for the imaginary machine on the new machine. This action results in an “existing 
implementation” of the compiler, running on the new machine, which can then be used to carry out the 
half bootstrap as described above. 


4.2 The Compiled Code 


Although there are weak spots, the code produced by the compiler is good considering that it is almost 
completely unoptimized. It is certainly better than would be produced if the abstract machine were the 
typical machine-independent abstract machine with one accumulator and one index register, given the 
same complexity of the macro definitions (they do not perform register allocation). Such an 
implementation would not be able to take advantage of the HIS-6000’s two accumulators or the multiple 
index registers, nor would it recognize the fact that byte pointers cannot fit in the index registers. 


One of the weak spots in the compiled code concerns floating-point operations. The code generator 
“performs” all floating-point operations in double-precision, issuing single-to-double conversion 
operations before using single-precision operands. It is unable to utilize the HIS-6000 machine 
instructions which operate on a single-precision operand’ in memory and a double-precision operand in 
the F register. Since the implementation of a single-to-double conversion is to load the single-precision 
Operand into the F register, very poor code is produced for single-precision floating-point expressions 
(as Opposed to very good code for double-precision expressions). One way to handle this situation would 
be to implement a general subtree-matching facility for optimization. With such a facility, the implementer 
specifies in the machine description that a particular combination of abstract machine operators (specified 
in the form of a tree) is to be replaced by the code generator with a new abstract machine operator; the 
new operator is defined by the implementer in the machine description just like any of the built-in 
Operators. In the floating-point case, one would specify that a subtree of the form (using a LISP-like 
notation) - 


( double-prec-add ( #1 , single-to-double ( #2 ) ) ) 
would be replaced by 
( single-prec-add ( #1, #2) ) 


where single-prec-add is a new abstract machine operator which would be defined to be the "FAD" 
instruction. This method of subtree-matching can be compared to the hierarchy of abstract machines 


- 28 - 


method in that the new abstract machine operators can be considered to be instructions of a higher-level 
abstract machine. The differences are that, in the case of the subtree-matching method, the definition of 
higher-level operators is optional (thus there is no multistage translation when optimization is not desired 
or needed) and that the implementer defines the higher-level operators to suit his needs. The subtree- 
matching approach to machine-dependent code optimization has been investigated by Wasilew [17] 


Another weakness in the compiled code concerns array subscripting. Instead of placing the offset of an 
array element into an index register and performing an indexed memory reference, the code generator 
adds the offset to a pointer to the base of the array, producing a pointer (in an index register) which is 
then used to reference the array element. Thus, the code generator regards index registers only as base 
registers to hold pointers, and not as index registers to hold offsets. One reason for not implementing 
the capability of using index registers for subscripting is that this method of subscripting is often not 
possible. For example, on machines like the HIS-6000 with single-indexed instructions, this method can be 
used only for external and stafic arrays; all other arrays require the use of an index register just to 
reference the base of the array. (Actually, one can perform double-indexing on the HIS-6000 by using 
an indirect word; however, this was not recognized at the time the compiler was written.) The capability 
of using index registers for subscripting could be implemented using the subtree-matching facility 
described above; one would test for subtrees of the form 


( pointer-add ( address-of ( extern | static ), <any> ) ) 


and replace them with a new abstract machine operator which would be defined to: produce the desired 
code. A more satisfying solution would give the code generator more knowledge about addressability so 
that it could use index registers for subscripting whenever possible, based on information given in the 
machine description. 


A third weakness of the compiled code is the use of indirection. The code generator only indirects 
through pointers in registers; it is unable to utilize an indirection-through-memory facility (except through 
a specific location which implements an abstract machine register). Again, a better understanding of 
addressing is what is really needed. 


4.3 Summary of Results 


This paper has presented a technique for the design of portable compilers and has demonstrated its 
practicality through the implementation of a portable C compiler. The main difference between this work 
and the previous work described in section 1.2 is that in this work, the system was designed specifically 
for the language being implemented; it is this restriction which contributes most to the practicality of the 
approach. In addition, this work has emphasized the concept of a machine-dependent abstract machine, 
thus tying together the work on portable compilers and program transferability. 


The advantages of the technique presented in this paper over the technique of rewriting some or all of 
the generation phase are (i) that the implementer can modify the compiler to produce code for a new 
machine with less effort and in less time, and (2) that the implementer can be more confident in the 
correctness of the modifications. Almost the entire code of the generation phase, already tested in the 
initial implementation, is unchanged in the new implementation. This code includes the code generation 
algorithm, the register management routines, and the macro expander. Furthermore, the modifications 
which must be made are localized in two ‘areas, the machine description and the C routine macro 
definitions. The implementer is primarily concerned with the correct implementation of the individual 
abstract machine instructions. The interaction among these instructions, in terms of their correct ordering 
and the use of registers and temporary locations, is handied by the code generation algorithm and need 
not be of concern to the implementer. It is this reduction in the complexity of the problem which leads 
to the increased confidence in the results of the modification. 


The portability of the compiler has been tested by the construction of version of the compiler for the 
DEC PDP-10. The initial machine description and macro definitions for the PDP-10 implementation were 
written and debugged by the author in a period of two days. 


- 29 - 


4.4 Further Work 


There are three main directions for further work. = One | is to aalie machine models which. will allow the 
generation of acceptable code for a larger class of machines. Such machine models will have the effect of 
reducing the complexity of the descriptions of machines which do not completely correspond to the 
machine model described in this paper. With the HiS-6000; ih xi ee. ‘only major area of 
complexity in the machine description is that of character mgnip desire a machine 
model which allows the implementer te describe more conveniently he  taplonde ation of characters on 
his machine. Similarly, @ machine model which allows a better wnderstending. of adcrocsing would. be 
desirable. 


will produce more efficient code. In particular, the probigih ‘of’ ri “s Aunder complex 
constraints should be exemined.~ In addition, fechrijues tor SHOwihig tite jnenter to extend easily and 
safely the code generation aigorithm through the addition of procedural bnowiedge should be developed. 
Such techniques should allow the compiler to be. modified . to produce code . for. unenticipated new 
machines. : 


Another direction for further work is to develop ge Seer he lent code agi eal algorithms which 


The third direction for further work is to apply the technique of portable compilers to more complicated 
and more powerful languages. The technique of ‘using a mecting- wit code generation algorithm 
and a machine description, even aside from’ portability; results’ in’ @' Very “clean and modular code 
generator. It would be interesting to see if this t j 


generators for lerge languages end whether: portability ‘ca 
efficiency of the object code. 


Could redyce.the complexity of code 
ati be obtained ‘without “Gettroying the 
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$ data ¢z,copy 

t4 . Sot Scs Ssy Ser Sma 8st Shm >>8el 
$ endcopy 

$ break 

$ program rihs,onl 

$ limits ,18k,,1000 

8 prmfl hs,r,r,sny/bt5 
$ prmfl el,r/w,,#/%e 
8 file er,elr,5! 

$ file cs,clr,5I 

$ data cz,copy 

bt5S . Ser Scs >>$8el 

8 endcopy 

$ endjob 
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The tog of the machine covcription | is Gescribed in detail intro fole vl Le Exonsies ae taken : 


Betas 


The convention of writing syntactic alternatives on separate tines is used throughout 
1. Definition Statements 


The machine description begins with s series ot definition statements. ‘Th 
described in the: ‘sections below in the order in which ‘they should’appe’ 


1 1 The TYPENAMES Statement | kt 
The TYPENAMES stthenent defines the names which are used in the machine dbetsiption to represent ‘the 


primitive C data types: character, integer, floating-point, and ion flosting-point. The. form of. 
the TYPENAMES statement is as 


9 ) definition statements are 
7 ; description. 


<typenames_stmt>: typenames ( <name_jist> ); 
<name_Jjist>: <name_jist> , <name> 
<name> 


The first name corresponds to the internal type number 0, the: a 
internal type numbers are fixed | in the compiler, the sti 
to): 


nd ith ‘type 1, etc. Biceusis the 
ent should always be (equivalent 


_typenames (char, int, float, double); 
1.2 The: REGNAMES Statement 


The REGNAMES statement defines the names of the sbaivect machine riers these i regitrs are. 
assigned internal register numbers (used in REF.BASE, section 2.1.1.2), starting with register number. 0, in 
the order in which they appear in the REGNAMES statement. The form Of tte RE WES statement is 
similar to that of the TYPENAMES statement; for example, the. oe statement vsed in the HIS-6000 
implementation is 


regnames (xO, x1; x2, x3, x4, x5; a, a, f) 


In this example, ail but the F register correspond directly to actual resistor ond the. HIs-6000: registers 
XO through X4 are the first five (out of eight) index registers, ” ers A and ‘Q' are the two 
accumulators. The F register is a fictitious floating-point accumulator: which in. reality corresponds to. the 
combined A, Q, end E. (exponent) registers. The tact that the F- roe : 
registers is specitied in the CONFLICT statement, desctibed below. Orly’ those’ 

whieh are to be used by the. code generator in producing code to evaluate 9 rpressions should 'be included 
in the REGNAMES statement; régisters used only for environment pointer Sry address calculations, — 
or. other scratch caicutetions-perfortied: within the code for 9 sifigie MAD -¢hduld fot be included in the — 
REGNAMES - statewient: «For -example, on’ the HIS-6000;° tires’ index registérs: are not defined In the 
REGNAMES statement: X7, which contains a pointer to the current stack frame, X6, which contains a 
pointer to the current argument list, and X5, which is used as a scratch’ req “by ‘which access 
characters. 


“actyal machine. registers 


conflicts with the A and Q 
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1.8 The MEMNAMES Statement 


The MEMNAMES statement associstes names with the various clesses. of .memoy iferery 
by negative values of REF BASE (gnction 2-11-74 The form.of the MEMMAMES etetement 
of the TYPENAMES. ‘Statement; for example, the NEM ars 
implementation is 


memnames (reg, auto, ext, stat, param, label, intlit, floatlit, stringlit, iO, ixl, ix2, Ix, ind, ta, iq) 


The first nine names refer to predefined memory reference classes (REF.BASE = .0,-1,-2, . 8), the 
remaining names refer to indirect references. through the, abstract machine registers defined in. the 
REGNAMES statement (REF.BASE = -9,-10, .. ). The first name “reg” is never used; it serves only as 
placeholder. No name is provided for indirect references threugh thet maypgbicaincs ty F register is 
oe ne it dows not affect the . 
positions of the other names in the fist. : 


1.4 The SIZE Statement 


The SIZE statement defines the sizes of the Primitive C date tyepedn terms of bytes. The form of the 
SIZE statement is 


<size_stmt>: size <size_def_jist>; 

<size_def_Jist>: a <size_def> 

<size_def>: Pi f <type_Jist> ) 

<type_list>: <type_list> , <type> 
<type> 


The integers specify sizes in bytes; the types are the names of primitive: C date types (es .epucified in the 
TYPENAMES statement) with the ee size. a Mae the SIE stetuemart —_ in the HIS- 
6000 implementation is yest ; 


size 1(char),ACint float),S(double), 


All addresses computed by the compiler are in terms of byte addressing; byte eddresses are converted to 
word addresses for non-character operations by the macro definitions. For example, on the HIS-6000, if 


the first element of ari integer array begins at offset 0 in the static area, then subsequent elements of 
the array are at offsets 4, & 12, 16, etc. 


15 The ALIGN Statement | 


The ALIGN statement defines the alignment factors of the primitive C data types; these alignment factors 
are in bytes. The (byte) address of a variable with an alignment. tactor “n".must be zero modulo "n"; for 
example, on the HIS-6000, the (byte) address of an integer must be. a.multiple of 4. An. alignment fector 
must be divisible by all smaller alignment factors; this. allows the compiler to.s¢sign. addressee ratetive to 


a base which satisfies the highest alignmant.restriction. The fone, ofthe ALIGN. stakement:is eimiler to 
that of the SIZE statement; for example, the ALIGN statement used: in the -HiS-6000: deplementation is 


align 1{(char),Atint flost),&double), wil 
1.6 The CLASS Statement f 


The CLASS statement is an optional statement which allows the implementer to define cleenes of abstract 
machine registers which are used in similar ways; the register classes so defined can then be used in the 


machine description as abbreviations for the corresponding lists of enact) The form of the CLASS — 


statement is 
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<class_stmt>: class <class_def_Jist> ; 

<class_def_list>: <class_def_list> , <class_def> 
<class_def> 

<class_def>: <name> ( <register_}ist> ) 

<register_Jist>: <register_list> , <register> 
<register> 


The name is the name of the register class, the registers are the names of the abstract machine registers 
(as specified in the REGNAMES statement) which make up the corresponding register class. The CLASS 
statement used in the HIS-6000 implementation is 

class x(x0,x1,x2,x3,x4), r(a,q); 
This statement defines the class of index registers X and the class of general registers R. 


1.7 The CONFLICT Statement 


The CONFLICT statement is an optional statement which allows the implementer to specify abstract 
machine registers which conflict in the actual implementation. The form of the CONFLICT statement is 


<conflict_stmt>: conflict <conflict_def_ist> ; 

<conflict_def_Jist>: - <conflict_def_Jjist> , <conflict_def> 
<conflict_def> 

<conflict_def>: ( <register> , <register> ) 


Each register pair specifies two abstract machine registers such that only one of the registers can be in 
use at one time. The CONFLICT statement used in the HIS-6000 implementation is 


conflict (a,f), (q,f) 
which indicates that the F register conflicts with both the A and Q registers. 
1.88 The SAVEAREASIZE Statement 


The SAVEAREASIZE statement is used to specify the size of the save area which is reserved at the 
beginning of each stack frame. The save area is generally used for saving registers upon entry to a 
function, for chaining stack frames together, and for holding other per-invocation information. The form 
of the SAVEAREASIZE statement is 


saveareasize <integer> ; 


The integer specifies the size (in bytes) of the save area. The save area used in the HIS-6000 
implementation is 16 bytes (4 words) long. 


1.9 The POINTER Statement 


The POINTER statement defines classes of pointers according to their resolution; these pointer classes 
represent different implementations of pointers on the target machine. The resolution of a pointer 
corresponds to the alignment factors of the objects to which it can refer; in particular, a pointer with a 
resolution of “"n" bytes can refer only to objects whose alignment factors are multiples of “n” bytes. The 
primary use of pointer classes is on machines whose smallest addressable unit is larger than bytes; in this 

case, two pointer classes are defined: one which can resolve only machine-addressable units and another 
“ which can resolve individual bytes. By defining separate pointer classes, the implementer allows 
computations involving pointers which are known to refer to machine-addressable units to be performed 
in terms of machine-addressable units, and therefore more efficiently. The form of the POINTER 
statement is 
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<pointer_stmt>: — pointer <pointer_def_list>; 

<pointer_def_list>: <pointer_def_list> , <pointer_def>.. 
: <pointer _def> 

<pointer_def>: <name> ( <integer> ) 


The. names define the names of the pointer classes, the integers are the resolutions of the corresponding 
pointer classes. At least one. and no. more than four pointer, clesses.mey be defined agrestis renee, 
are referred to as PO, P1, P2, end P3 in the epecification of the AMOPs, 


The POINTER statement used in the HIS-6000 implementation is 
pointer p01), p1(4) 
PO is the class of pointers to byte-aligned objects; P1 is the class of pointers to word-aligned objects. . 


Word pointers can be held and operated upon in the index Sn ee in 
the general registers and indirected through by subroutine. 


1.10 The OFFSETRANGE Statement 


The OFFSETRANGE statement is an optional statement which defines, for each pointer claes defined in the 
POINTER statement, the range of offsets permitted in references indirect.vie such s peinter (see section — 
2.1.1.2). The form of the OFFSETRANGE statement is 


<offsetrange_stmt>: offsetrange <offset_det_Jist> . 
<offset_def_jist>: <offset_def_jist> ,.<offset.def> . 
<offset_def>: <pointer_class_neme> ( <lo_bound> , <hi_bound> ) 


where the lo_bounds and hi_bounds are optional integers. Each offset_def specifies the range of 
allowable offsets for a particular pointer. cless; this.cange. «the set of integers. not dess thar: lo_bound . 
and not greater than hi_bound. If a bound is not Present, then the range ie cormidered unbounded in the 


corresponding direction. If no range is specified for a.pointer-cléee,; thd ely zure 
any specified range must include zero. 


1.11 The RETURNREG Statement 


The RETURNREG statement specifies in which registers functions returning values of verious types return 
those values. Registers must be specified for types INT and DOUBLE es well es for all pointer classes 
defined in the POINTER statement. The form of the RETURNREG statement is. we 


. Sreturnreg_stmt>: returnreg <return_def,Jist> ; 
<return_def_Jist>: <return_def_Jist> , <return_def> 
<return_def> 
<return_def>: <register> ( <type_ljist> ) 


The types may be names of primitive C data types as defined.in the TYPENAMES statement or names of 
pointer Classes as defined in the POINTER statement; the corresponding. register je defined to be the 
register in which functions returning valuas. of those types. ral non tien onternet view for euempie, | 


the RETURNREG statement used in the HIS-6000 implementation 
returnreg. q(int,p0,p1), (double); 


It is. advised that pointers ofall clases, be returned in the came roiaer in » compatible form to avoid 
errors caused by mismatches. ny the, Gaciarations of functions returning: pointers. 
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112 The TYPE Statement 


The TYPE statement defines which registers are to be used in th the eveluation of expressions to hold 
values of the various abstract machine date types. The form of the TYPE statement is 


<type_stmt>: type <type_def_list> ; 

<type_def_jist>: <type_def_Jist> , <type_sef>. 
<type_def> 

<type_def>: <type> ( <register_Jist> ) 


The type is the name of a primitive C data type as defined in the. (PENBM sistonient or the name of a 
pointer class as defined in the POINTER statement; the registers “are the. abstract machine registers or 


classes of abstract machine registers which ay be used to hold veniee nm. the corresponding type. For 
example, the TYPE statement used in the HIS ntation is 


type char(r),int(r),float(f),double(f),p0(r),p 1(x); 


The registers specified in the TYPE statement need not include eve fagister physically capable of 
holding a particular type; only those registers which the aptomntte desires 


expressions hed type should be included in the TYPE stetement. In. fhe. HIS-6000. example,.only. the 


index regist are specified for the pointer class Pi even thou 
capable of holding such pointers and, in fact, @ general ragialer {the Q cagists 
pointer when returned bya function’ call this wes. owe, ‘inorder to. minim: 
genéral registers which are: relatively féw in number. . 


2. ‘The OPLOC Section 


In the OPLOC section of the machine i ete the AMOPs are defined in terms of the possible locations 
of their operands and the corresponding locations of their results... pert consists of. a list. of 
triples called OPLOCs; an OPLOC specifies a perticular set hy tirst eperaed locations, second.operand 
locations, and result locations. An OPLOC may also spécity that, Or-more registers are clobbered by 
the execution of the code for an abstract machine instruction; this ‘code generator. that. it may 
be necessary to emit instructions to save the contents. of the cabot routers before..emitting the 
abstract machine instruction. The forms of an OPLOC are — 


).is weed. to hold.such « 


<loc_expr>, <loc_expr> » <loc_expr> ; 
and 


<loc_expr> , <loc_expr>, <loc_expr> <clobber> ; 


where a clobber is a list of one or more ragister names separated by. s 
brackets. The location expressions specify locations for. tha’ first 
respectively. A location expression specifies either 9 set-of cagister 
classes; ‘these ‘ets may be specified using perticular registers or memory reference classes slong with 
the operations of union ("[) and negation ('~"). The syntax of a'location expression is. 


to use in evaluating 


SOMES and. enclosed in square. 
OF 8: ‘at of RPRORY. reference : 


Ny 


7 


h, the .ganerel. registers (R) are 
ize recone use of the. 


<loc_expr>: <register_expr> 
<memory_expr> 
i 
2 
<null> . 
<register_expr>: '  register_expr> | <ragister_expr> 


~ <register_expr> 

( <register_expr> ) 

_ Sregister_name> 
“register class_name> 


<memory_expr>: <memory expr> | <memory_expr> 
~ <memory_expr> 
( <memory_expr> ) 
<memory ef_cless name> 
M 


indirect 


The negation peebalite *~’ has precedence over the union operator T. The location expressions “1° and 
"2" may be used only for the location of a result; they specify that the result is placed in the first or 
second operand location, respectively. Only the location expression for the second operand of a unery 
AMOP may be null. The tocation expression "©" represents. the set of ail. memory, reference classes; classes; the 
location expression “indirect” represents the set of sil indirect memory reference classes. 


The OPLOCs are associated with AMOPs in lection detintions which consist of one or more AMOP labels 
followed by one or more OPLOCs: 


<loc_def>:  SAMOP_Jist> <OPLOC_Jist> 

<AMOP_Jist>: <AMOP_Jist> <AMOP_Jabel> 
<AMOP _jabel> 

<AMOP_Jabel>: <AMOP> : 

<OPLOC_Jist>: <OPLOC Jat» <OPLOC> 


Each AMOP in the list of AMOP labels is associated with the list of ‘OPLOCS; each OPLOC in the list of 
OPLOCs represents an acceptable set of operand/result locations for each of the AMOPs. For example, 
the location definition 


+d: -d: #d: /d: =f, Mf; 
used in the HIS-6000 machine description specifies that the AMOPs for double-precision floating-point 
addition, subtraction, multiplication, and division afl take. their first operand in the F register, their second 
operand in memory, end piace their result in the F register. Another example is. the Sean ptientn 
—esl es Ma,q; M,q,85 
which specifies that the AMOPs left-shift-assignment and right-shift-assignment both take their first 


operand in memory, their second operand in a general register, and place their result in the other general 
register. A third example is the location definition oe : 


ais fi: aMafa} 
which specifies that the AMOPs for integer multiplication and division both take their first operand in the 


Q register, their second operand in memory, plece their result in the Q register, end clobber the contents 
af the A register in the process. Note that the location definitions 


a 


+i: rM,1; . 
and 

ti: nM ; 
are not equivalent. The second definition allows “the code seicee eter ie: emit an abstract machine 


instruction which adds an integer in memory to an inleger: ithe A bepleter: and plice the result in the Q 
register; the first definition requires that the result be apes dahisspeiiter witeining the first operand. 


The OPLOC section of the machine description consists. ofa nequehte ot location definitions which define 
the AMOPs of the intermediate language. (A small number of AMOPs should not be defined in the OPLOC 
section, of the machine .descriptions.these: are yeveamese lige carsaerad lead —— — ne’ more 
than ance in. the OPLOG section af: the machine desoription. + . 


“i 


‘The Macro Section 


The macro hection of the machine description contains the macro definitions ‘for the AMOPs; these macro 
definitions ¢xpand into the objectianguage statements nesdedito interpret the corresponding sett act 
machine instructions. A macro definition consists of a list of AM@P labile followed by a list of chardéter 
string constants. The list of AMOP labels specify that abstract aiachine imétructions for these AMOPs sre 
to be emitted as macro calls which refer to this macro definition. The character strings make up the body 
of the. macro definition; they are written. out in sequence es the expansion of 's corresponding macro call. 
The character. strings may have eptional location prefixes: whigh:test for acepecifie'set of tocations of the 
operands and.resuit; a character string with en ettecheddocetion: piethxie Welaled fi the exparaion of the 

macro call only if the. test. specified by the Jocation: prefix: surceedé: A tharécte? ‘string may Contain 
embedded macro calls and references to the arguments of the macro call (see Appendix VI, section 4). 


The macro.definition. far an AMQP. must: corcsspond to the Jovetion definition ¥dethesAMMOP in that correct 


—— must be generated for: val combinations of enecongroem meee Se eeCes Oy. the focation 
finition. : aes) 


The macro definitions can. refer to..the AMOP ond the: operendfrocut locations by wits the’ following 
abbreviations: 


" abbreviation experaion enh 


sO %n(#0) symbolic representation of operation 

aF %n{#3,%4) symbolic representation of first operand. 

#S ti 95 a6) _symbolic-represemtetion of secofd-éper and 

aR Xn( 21,82) symbolic representation of result 

#0 #0 internal representation of operation — 

«°F 03,04 internal: represent stiowot eet | e 0 

#S 05,96 internal ri operend 
wR sle2 | internat vepredenteton OF reat 


Recall that in the intermediate langiaas representation: Of ‘arr abstract: méchirie fnetriietion:: the first 
argument of the macro call is the AMOP opcode, and the following erguments are REFs for the result, first 
operand, and second operand (see section 2.1.1.2).’ The macro “n” is the. implementer-defined NAME 
macro which can return any convenient symbolic representation for an operation or operand/result 


- location; it is assumed to be implemented as a C routinécalled ANAME (see Appendix VI, section 4). 


An example of a simple macro definition is the ‘definition for integer addition used in the HIS-6000 
machine description. The location definition is 


+i: rM1; 


and the macro definition is 
Hs * ADSR = aS" | 
; This location/macro definition of the AMOP ’+i’ expands to produce essombly language statements such as 
ADA . X. _—_Coxterreal-veriable:"X") | | | 
AnQ 3,DL (titeral“3") 


ADA 0,2. (indirect through X2) 
AdQ. §,7° (an automatic or: temporary) 


A more complicated macro: definition is. used-for. the. AMOP *i’ (move integer). This macro definition must 

be capable of generating code to move an integer between s:menory letetion anda general register or =. 
from one general register. to the other. Three character strings. with location: prefixes ore used for. the 
three cases register-to-memory, memory-to-register, and: Ba aaa eal 


ne i ST aF ak 
(Myr): = . LOeR- or” 
(r,r): : LLR 36° 


The location prefixes consist: of location: expressions: for the first operand, second operand; and result. 

The operand and. result locetione.of a: particular: maere call are compared'té the location: expressions in. 
the location prefix. (comparisons: with: a-ruatdosetion expression ‘always susdeed) if ell’ three comparisons 
succeed, the corresponding: character string is included in:the experision orthe-macro: caf 


The macro satis COE uk oubictss Gbuciigbic = way asses deueey aus adebdl deusbac uses be 
keyword macros (see section 21:2} or: implementer-detined: macros: which ere called in the: definitions of 
other macros. A named macro is defined by using the name of the mecro in plece of an AMOP in‘ the. 
label(s) preceding the body of the macro defintion. A single macro definition may. have both AMOP and 
macro. name labels;. thie: is useful wherr it is desired: that the defintion of one abstract machine instruction 
itself contain another abstract machine instruction since the “internal” names used to refer to the macro 
‘definitions of AMOPs sre not accessible to the writer of the machine description. An. example of a 
keyword macro definition in the: HI§-6000 machine descriptieriethet for the GNTA¥ mecro: 


en: 7 SYMREF 96° 


The argument to the ENTRY macro. is. ar aseembler symbol as produced by the ION macro (see Appendix 
I]). 


The macro section of the machine description consists of the reserved word “macros” followed by a 

sequence of macro definitions.. Macro definitions. must. be provided for Epasi of the AMOPs of the 
intermediate language (exceptions are: indicaied.in Appendix. Il} end for all of tha keyword macros of the’ 
intermediate language which are not. defined by C routines. An AMOP or a macro name may not be 
Seine ere Ten ence: se Tne prmes sehen ne naeeine dentate ia i -_ ee 


om 
Appendix II - The Intermediate Language: AMOPs c 


The operations of the abstract machine are represented in the intermediate languege as three-address 
instructions; the operators of these instructions, called sbatract ss qaneators (AMOPs), ere described. . 
in the tables below. For each AMOP is listed its n octal), its symbolic representation in the 
machine description, the types of its operands and result, end.» deucription of the basic operation | 
involved. The type entry consists of a list of types for the. # ae re operand (if any), and 
result of an AMOP, in that order; the types ere taken from She. f9 of abbreviations: 


¢c character 

i integer — 

f floating-point 

d double-precision floating-point 

x any type 

p any pointer 

po class 0 pointer 

pl class | pointer 

p2 class. 2 pointer 

p3 class 3 pointer 

I a location (the result of a jump) 
The following notes are referenced in the AMOP tables: : 
This AMOP should be defined only if the corr elie pointer clases are dined 
The definition of this AMOP is optional. . 
OPLOCs should not be specified for this. AMOP. eae 
This AMOP is used only in the tree repressetation of expressions interne! to the code 
generation phase: it should not appear in. the machine deat 
This AMOP causes a side-effect .. Ate! 3 
therefore, ell OPLOCs for this P must. 
operand. 


ao A20N — 
Cr er 


first). opmeanel, which must be en ivelue: 
pec! RS Eis location gt the (feat) 


Unary Abstract Mechine Operators 

opcode symbol types  nétes §=besic operation 

0000 -ui ii unary minus — 

Qoo1 “ud dd. unery minus 

0002 —s + #bi ij- 5 -pre-increment 

0003 +t ai i,j 5 post-increment 

0004 = =~bi i,j 5 pre-decrement 

0005 —s --ai ii 5 post-decrement 

0006 .BNOT isi é bitwise negation 
0007! x,i 4 truth-value negation 
0012 = .sw iJ switch 

0013 = ++be ¢,i 5 pre-increment 

0014 = ++ac Ci 5 post-increment 

0015 -=be Ci 5 pre-decrement 

0016 = --ac Ci a) post-decrement 
0017 &ud x,p0 address of 

0020) = &ul x,pl 1 address of 

0021 &u2 x,p2 1 address of 

0022 = &u3 x,p3 1. address of 

0023 tu PX 4 indirection — 

0024 =0p0 —s wp: 2 Jump-on null pointer 
0025 =-Op!l pil, 1,2 jump on null pointer 
0026 ===0p2_—Cos ip:2 1,2 jump on null pointer 
0027 ==0p3_—s 3 1,2 jump on null pointer 
0030 [=0p0 po, ‘2 ~~ jump On non-null pointer 
0031 Op! pl, 1,2  jump’on non-null pointer 
0032 ‘!=0p2 2, 1,2 > jutap @n non-null pointer 
0033. I=0p3_—s p33) 1,2 jump.on non-null pointer 


Conversion Abstract. Machine Operators 


opcode 


0040 
0041 
0042 
0043 
0044 
0045 
0046. 
0047 
~ 0050 
0051 
0052 
0053 
0054 
0055 
0056 
0057 
0060 
0061 
0062 
0063 
0064 
0065 
0066 
0067 
0070 
0071 
0072 
0073 
0074 
0075 
0076 
0077 


ee ee ee ee ee ee ee ee) 


convert c toi 
convert c tof 
convert c tod. 


‘convert i toc 


convertitof | 
convert i to d 
convert j to pd: 
convert i to pl 
convert | to p2: 
convert i to p3 
convert f to ¢ 
convert f toi 
convert f to d 
convert d toc 
convert d to i 


‘ convert d to f 


convert pO to j 


convert pO to pl - 


canvert p0.to p2 


convert pO to p3 | 


convert pl! to i 
convert pl to pO 
convert pl to p2 
convert pl to p3 
convert p2 to i 
convert p2 to pO 


convert p2 to pl . 


convert p2 to p3 
convert p3 to | 
convert p3 to pO 


convert p3 to pl — 


convert p3 to p2 


Binery Abstract Machine Operators 
opcode symbol types 

0100 +i iit 

0101 m4 i,i,i 25 
0102 =o +d ddd 

0103 . =+d ddd 25 
0104 =-i ij, 

0105) =i i,i,i 25 
0106 = -d ddd 

0107 )—s wd ddd <5 
0110 _~ i i,i,f 

Oll1 =a i,i,i 2,5 
Oll2 = ad d,d,d 

Ol13. =ed ddd 25 
0114 fi i,i,i- ae 
0115 si i,i,i 25 
0116 /d dda 

0117 = =/d d,d,d 25 
0120 % i,t 

O121 ==% i,j,i 25 
0122 «< ii, 

01230 =«<< i,i,i 25 
0124 >> i,iJ 
0125 => iii: 25 
0126 & iif 

0127) && i,i,i 25 
0130 A iit 
0131 ==a i,i,i 2,5 
0132 OR i,ii 

0133 =OR i,i,i 25 
0134 && XX, 4 
0135 .TVOR XX,i 4 
0136 -pOpO  p0,p0,i 

0137 = X,X,% 4 
0146 = +p0 p0,i,p0 

0147 +pl pliipl ] 
0150 = +p2 p2,i,p2 1 
0151 ‘4p3 p3,i,p3 1 
0152 -p0 p0,i,p0 

0183. -pl pl,ipl 1 
0154 -p2 p2,i,p2 i 
0155 -p3 p3,i,p3 i 


basic operation 


addition 
addition-assignment 
addition 
addition-assignment 
subtraction 
subtraction-assignment 
subtraction , 
subtraction-assignment 
multiplication 
multiplication-assignment 
multiplication = 
multiplication-assignment 


division-assignment 
division 


modulo-assignment 
left-shift 
left-shift-assignment 
right-shift ee 
right-shift-assignment. 
bitwise AND. 

bitwise AND-assignment 
bitwise XOR 
bitwise XOR-assignment 
bitwise OR 

bitwise OR-assignment 
truth-value AND . 
truth-vatue OR 


‘pointer subtraction 


assignment 

increment pointer by 
increment pointer by 
increment pointer by 
increment pointer by 
decrement pointer by 
decrement pointer by 
decrement pointer by - 
decrement pointer by 


Abstract Machine Operators, continued 


opcode 


0160 
0161 
0162 
0163 
0164 
0165 
0166 
0167 
0171 
0172 
0200 
0201 
0202 
0203 
0204 
0205 
0206 
0207 
0210 
0211 
0212 
0213 
0214 
0215 
0216 


0217 © 


0220 
0221 
0222 
0223 
0224 
0225 
0226 
0227 
0230 
0231 
0232 
0233 
0234 
0235 
0236 
0237 
0240 
0241 
0242 
0243 


symbol 


cc 
lt 

ff 

dd 
-pOpd 
-pPlpl 
-p2p2 
-p3p3 
9 


wei 
ten} 

<j 

>j 

<sj 
>mj 
==d 
Ind 
<d 
>d 
<ed 
>ad 
==xpQ 
'=pOQ 
<pO 
>pd 
<=p0 
>=p0 
map] 
fap] 
<pl 
>pl 
<=p1 
>=pl 
==p2 
!mp2 
<p2 
>p2 
<=p2 
>=p2 
==p3 
!mp3 
<p3 
>p3 
<=p3 
>=p3 


notes 
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basic operation 


move character 

move integer 

move float 

move double 

move pointer pO 

move pointer pl 

move pointer p2 

move pointer p3 
conditional 

conditional 

jump on equal 

jump on not equal 

jump on less than 

jump on greater than 
jump on less than or equal 
jump on greater than or equal 


Abstract Machine Operators, continued 


symbol 


++bp0 
++ap0 
--bpO 
--apO 
++bp1 
++ap 1 
--bpl 
--apl 
++bp2 
++ap2 


_ -~-bp2 


--ap2 
++bp3 
++ap3 
--bp3 
~-ap3 


basic operation 


. pre-increment by 


post-increment by 


pre-decrement by . 


post-decrament by 
pre-increment by 
post-increment by. 
pre-decrement by 
post-decrement by 
pre-increment by 
post-increment by 
pre-decrement by 
post-decrement. by 
pre-increment by 
post-increment by 
pre-decrement by 
post-decrement by 
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Appendix III ~- The Intermediate Language: Keyword Macros 


The keyword macros of the intermediate language are described below in alphabetical order. Each 
section is headed by the name of a macro and its calling sequence; following is a description of the 
arguments and the intended function of the macro call. 


1. ADCONn: 7ZAn(NAME) [n=0 ,1,2,3] 


This is a set of macros, one for each possible pointer class. NAME isan object-language symbol 
constructed from an identifier by the IDN macro. The expansion of an ADCONn macro should define a 
pointer constant of pointer class “"n" which points to the external variable or function with the given 
name. This macro is used in the initialization of static and external pointers and arrays of pointers. 


2. ALIGN: ZAL(N) 


N is an integer specifying the CTYPE (an internal type specification) of an object for which the 
appropriate alignment of the location counter must be made. The relevant CTYPEs are: 


value ctype 


2 char 

3 int 

4 float 

5 double 
6-9 pointer 


The expansion of the macro call should be the pseudo-operations needed (if any) to properly align the 
location counter. This macro is used in the initialization of static and external variables. 


8. CALL: 7ZCA(NARGS,ARGP,O,FBASE,FOFFSET ) 


The CALL macro generates a function call. NARGS is an integer specifying the number of arguments to 
the function call; ARGP is an integer specifying the byte offset in the caller's stack frame of the 
arguments which have been so placed by previous instructions. FBASE and FOFFSET are integers which 
together make up a REF specifying the location of the function being callud (it may be indirect through a 
pointer in a register); these are passed as arguments 3 and 4 of the macro cail so that they may be 
referenced as #F in the macro definition. 


4. CHAR: 7ZC(I) 


The CHAR macro produces a definition of a character constant whose value is the integer I; it is used in 
the initialization of static and external characters and arrays of characters. When producing code for an 
assembler which does not have a byte location counter (for example, the HIS-6000 assembler GMAP), the 
characters produced by CHAR macro calls must be stored in a buffer until either enough are accumulated 
to fill a machine word or a macro call other than CHAR is issued; in this case, all macros which may follow 
a CHAR macro must first check to see if there are any characters in the buffer and if so, print the 
appropriate statement and clear the buffer. 


5. DOUBLE: 7ZD(I) 


The DOUBLE macro produces a definition of a non-negative double-precision floating-point constant 
whose C source representation is stored in the internal compiler table CSTORE at an offset specified by 
the integer I (the compiler itself does not use any floating-point operations). This macro is used in the 
initialization of static and external double-precision floating-point variables and arrays. 


6. END: ZEND() 


The END macro marks the end of the intermediate language program. It may produce an END statement, if 
needed, or signal that any processing associated with the end of the program sheuld be.performed. 


7. ENTRY: ZEN(NAME) 


NAME is an object language symbol! constructed from an identifier by the IDN mecro. The expansion of 
the ENTRY macro should define the symbol as.an.entry point, that is, one-which is detined:in the current 
program but accessible to other programs. 


8. EPILOG: ZEP(FUNCNO,FRAMESIZE) 


The EPILOG macro produces the epilog code for a C function. The epilog code shauld restore the 
environment of the calling function and return to that function. In the HIS-6000 implementation, these 
actions are performed by a subroutine. FUNCNO and FRAMESIZE are integers which. spacity the: internal 
function number of the function and the sizein bytes of its steck frame, respectively. in-the HIS-6000 
implementation, these integers are used to define an essembly-lenguage symbol whose value is the size.in 
words of the stack frame; this symbol is used eee ee ee 
the stack frame. 


®. EQU: ZEQ(NAME) 


NAME is an object language symbol constructed from an identifier by the IDN macro; it-is:to be defined as. 
having a value equal to the current value of the location counter. 


10. EXTRN: TEX (NAME) 


The EXTRN macro is similar to the ENTRY macro except that it defines the symbol to be an external 
reference, that is, one which is used in the current program but assumed to be ‘defined in in another 
program. 


11. FLOAT: 12F() 


The FLOAT macro produces a definition of a non-negative single-precision floating-point .conetent; the 
argument has the same interpreistion as that of the DOUBLE macro. 


12. GOTO: 72G0(0,BASE,OFFSET) 


The GOTO macro produces an unconditional jump to a location denoted in the source program by a label 
constant or expression. BASE and OFFSET together make up.e REF which specifies the terget Jocation of 
the jump; these are passed as arguments 1 and 2 of the macro-call so he Mer Oty Seceternanne ee wR 
in the macro definition. 


18. HEAD: 7HD() 


The HEAD macro marks the beginning of the intermediate language program. it may produce header 
statements, if needed, or signal that any initialization processing should be performed. 


14. IDN: 71(X) 


The ION macro should expand to the object language representation of the identifier whote C source 
representation is stored in the internal compiler table CSTORE at an offset specified by the intager X. 
The processing performed by this macro may include the truncation of long.nemes, the replacement of the 
underline character (which is allowed in C identifiers), and the insertion of special characteris) to avoid 
conflicts between C identifiers and other object language symbols. 
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15. INT: 12IN (I) 


The INT macro produces. a definition. of an lees constant ates value ie specified by the integer I. It 
is used in thy teitialeeto of wele-ote goberyelwarieited at rays Wik Ym ' ction of ‘tables for ; 
the LSWITCH macro. ae aS 


16. LABCON: -ILC(N) 


The LABCON macro ecccrelee an address constant whose value is theadiress corresponding to internal’ 
label number N. The LABCON macro ie used to construct the tables for the SWITCH and TSWITCH ; 
macros, ; 


17. LABDEF: 72L(N) 
The LABDEF macro defines the location of internal label number N. 
18. .LN: ZLN(N) 


The LN macro associates the line in the source program whose line meet is specified bs the integer 'N 
with the current value. of the location counter. This macro may optionally a 
object program to aid in the reading of the object program, or it may define’ a’ ‘tine-number: symbol to be 

used in conjunction with a debugging system. 


19. LSWITCH: 1L5(N,LBASE,LOFPSRT,IBASE,IOFFSET) 


The LSWITCH macro should generate code which’ jumps according, to: the value of the integer whose 
location is given by IBASE and IOFFSET (selected from the locations permitted: by the OPLOC for the .sw 
operation).. This. macro is immediately ‘followed by N-(NtO} ANT macroe (ie ¢ which ‘are twmediately 
followed by N LA@CON macros (the corresponding febels). : A enateh doula bir ade thedugh the ‘case fist; 
if a match is found,.a jump should: be -mede'te the ‘label defined by the curtesponding LABCON macro. ‘If 
the integer matches none of. the tet entries, thene jump areuld' by ‘mete te’ the interher abet Set PY. 
LBASE and LOFFSET. 


20. NDOUBLE: 2ND(I) 


The NOOUBLE macro is the seme as the DOUBLE macro except thet the value of the defined constant is 
made negative. 


21. NFLOAT: INF(I) 


The NFLOAT macro is the same as the FLOAT macro except that the vaiue of the defined constant i is made 
negative. 


22. PROLOG: 2P (FUNONO FUNCNAME) 


The PROLOG macro produces the prolog code for a C function. FUNCNAME isan integer representing the 
name of the function as it appears in the source program; its interpretation is the seme as that of the 
argument of the IDN macro. FUNCNO is an integer which specifies the internal function number of the | 
function; it may be used in conjunction with the EPILOG macro to access the size of the function’s stack 
frame. The PROLOG macro should define the entry point name and produce the code necessary to save 
the environment of the calling function and to set up the environment of the called function using the 
information provided in the function call. In the HIS-6000 implementation, these actions are performed by 
a subroutine. The PROLOG macro call appears in the intermediate language program immediately before 
the first instruction of the corresponding function. 


@ 


@.a comment lina.in the . 


23. RETURN: tRT© 


The RETURN macro produces the statements needed to return. from a function to the calling function; in 
_ general, this macro will result in a transfer to the EPILOG code. The returned value of-the function is 


loaded by preceding macro calls into the appropriate register es specified in the RETURNREG statement of 
the machine description. ea be 


24. STATIC: I18T(N,S8) 


The ‘STATIC macro defines the location of the static variable whose internal static variable number is N.S - 
is the size of the static variable in bytes. Typically, this macro will define an pease! lenguage symbol . 
by which the static variable can be referenced. 


25. STRCON: 18C(N) 


The STRCON macro should gererste a character pointer which points to the string constant whose 
‘internal string number is N. The STRCON maces, is used in the initialization of static and external 
variables. Sy 


26. STRING: wRO - 


The STRING macro marks the place in the object program where the string. constants should be defined. 
This macro is implemented es a Groytine macro since substaentiot processing i is | 


27. TSWITCH: _ 178 (LO LBASE,LOPFSRE,IBAGE AOFFSRT HI) 


The TSWITCH macro produces an indexed jump based on the velue of the integer whose location is given . 
by a et ee This 
macro is immediately followed..by # sequence of Hi-iO+l LABGON metres. defining the target abuts 
corresponding to integer values: from L0-to-HL Values outside thie-retige slibeld fecuit tn. trandters te the 
internal label defined by LI ASE and LOFFSET. 


28. ZERO: 72Z(1) 


The ZERO macro. specifies the: definition of s-binck:ef storage Iiaized to: zeros the size i bytes of ~ 
a area is specified by the integer I. 


“ 


Appendix IV - The HIS-6000 Machine. Description 


The machine description used in the HIS~6000 implementation is listed below. Much of its complexity is 3 
direct result of the fact that the HIS-6000 is not byte-addressed. In the macro definitions, the cherecter: 


sequence "\n’ represents the newline character. 


typenames (char,int,fioat doublé 
regnames (x0,x1,x2,x3,x4,a,a,f) — 


memnames (reg,autopxt,stat,paramiabel,intlt Hoattatinght 0x1 x2 Objoia 


size 1(char),4(int,float),&(double); 

align 1(char),4(int,fioat),&(double); 

class x(x0,x1,x2,x3,x4), r(a,q); 

conflict (a,f),(q,f); 

saveareasize 16; 

pointer p0(1), p1(4); - 

returnreg q(int,p0,p1),f(double); 

type char(r),int(r),float(f),double(f),pO(r),p 1 (x); 


==p: !=p: <p: >p; <ap: >=p: 


Swe Oy l{x4} 
+p0: -p0: +i:-i:&:A: OR: -pOp0: <<; >»: rMi; 
+pl: Mx; 
-pl: x0, 1; 
=+i: ©&: =A: =OR: My, 15 
wi: /is aMedi 
+d: -d: ad: /d: Mf; 

: aMa[q 
m<<; m>> oe Mass 
&u: 
stojnttaistrngitiaan 
-BNOT: .ies ci: el3 

“eul: --bi: Mr; 
Cf: .cd: if: .id: a, f; 
-fe: .de: .fis .di £05 
fd:  Myfs 
df: -ud: f,1; 
ip0: .pOi: Cyl; 
M,r; 
ipl: .pOpl: PX 
Mx; 
-Pli: .pl pd: Xn 
Mr; 
++bi: M,1; 
++ai; --ai: ++bc: ++ac: 
--be: ~-ac: Maa} 
Mala} - 
++bp: --bp: Myx; 
++ap: --ap: MMa(q} 
MMa[a} 
MM,x; 
mmQ: 100; <0; >0: <a): >a): rit.r3 
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macros 
sw: " TSX5 SWIC" 
ci \\" 
CC: 
(auto,,): * EAaR 0,7\n 
(stat,,): “ EAsR STAT\n" 
(ia,q): " STA TEMP 
(iq,,a): * | $TQ TEMP 
- LDA TEMP\n" 
(auto|statlindirect,,): 
“%if(%o#'F), ADeR %co(e'F)\n,) 
TSX5 CTOeR" 
(ext|stringlit,,): 
= DaeR of 
eRRL 27” 
(rr): * EAsR sO, FL 
#RRL—s«18" 
(r,autojstatlindirect|stringlit): 
EAXS  —0,eF L\n” 
(r,, auto): * “el - . O7\n" 
(r,, stat): *  STAT\n" 


(r, autofstat): “artto ADeF = Kealw’R}\n,) 
(r,stringlit): ° ; caer aR 


: TSX4 TOC" 
(r,ext): * #FLS 27 
STaF af 
#FRL 27" 
(q,,ia): 
"Kif(X%o(#’R), ADA %o( 
TSX4 ATOC" 
(a,iq): : 
“%it(%o(#’R), ADQ %cole#’R)\n,) 
TSX4 .QTOC" 
lis 
(r,,M): " STaF oR” 
(4,7): " LOaR af" 
(rr): * LLR . 36". 
ff: , i 
(f,,M): ” FSTR aR” 
(Mf): : FLD af” 
dd: 

(fy): " DFST eR" 
(M,,f): * DFLD aF* 
(r,r): * LLR 36" 
(rN): * STaF ar” 
(M,r): “ LDeR af” 


df; 


>>: 
Gintlit,): * 
G~intlit,): * 


<<; 


Gintlit,): ° 


(,~intlit,: * 


- $4 - 


++bp: 
(x): * 


(nr): ” 


--bp: 
(x): * 


Gor): * 


++ap: 
(x): * 
(,alq): ” 
(,a): * 
(,q): " - 
ap: 
(«3 * 
(»alq): * 
(,a): " 


(,q): * 


-BNOT: * 


&u: 

- (ialiqur): 
“Xif(%o(#"F), 
{ia,q): * 

{iq,,a): ” 
(autojstat,,r): ° 
Xif(Xo(e’F), 
(ext|stringlit,r): ” 
(0): * 


aF 
Xo(e’S)/4,01 
af" 


aF 
%co(#’S) 
af" 


aF 
~%o(#'S)/4,01 — 
af" 


a 


%o(#"S) 
aF* 


aF 


” Yo(#’S)/4,01 
af" 


“F 
aF\n" 
%co(#’S) 
aF* 


%co(#’S) 
“F" 


aF 
-%0(#"S)/4,e1 
oF" 


%co(#’F)\n,)\\" 
36" 

36” 

%n(#3,0) 
%co(#"F)\n,)\\" 
— 

af" 


SYMREF PROLG,EPILG, TEMP, SWTCH 


CMPeF aS 
TZE aR” 
CMPeF aS 
TNZ aR" 
CMPeF aS 
TZE #42 
TNC aR” 
‘CMPeF 2S. 
TZE +2 
TRC oR” 
CéMPeF aS 
TZE aR 
TNC — oR 
CMPeF aS 
TRC eR 
DFCMP = =@D0\n" 
CMPeR = 0,DL\n” 
“Sajc(a0,a2)" 
SBLeF aS 
@FRL 16" 
GMAP*" 

TRA 0” 
"#1" 

SYMODEF . #0” 
SYMREF 40” 
SYMREF 

EQU 2" 
EQU * 
TSxXO -PROLG 
ZERO .FSe0" 
"=V20/#1,16/0" 
TSX1 sow 
ZERO —-——s- @# 1/400" 
TRA PILG" 
TRA EPILG 
EQU (0 fe 


TRA ——s aR® 
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cpa: 

(auto,,): * _ EAQ 0,7\n". 
(stat,,): " EAQ STAT\n" 
(ia,,): * LLR =: 36\n" 


(autojstatlindirect,,): 
"Kif(Xo(#"F), § ADQ = %co(#’F)\n,)\\" 


++be: 
(autojstat|indirect,,): 
“xcpq(0,0,0,0"F) 
STQ -TEMP 
LDA «TEMP 
TSX5 TOA 
ADA 1,DL 
ANA —-=0377,DL 
7 OAL 
TBx4 TOC" 
(ext : ae 
STA 
~-be: 
(autojstatlindirect,,): 
"XcpqX0,0,0,0'F) 
STQ -TEMP 
LDA .TEMP 
TSX5 .CTOA 
SBA 1,0 
ANA =0377,DL 
m0) WAL 
TSX4 .QTOC* 
(ext,,): * LDA oF 
SBA =01000,DU 
STA oF" 
++4aC: 
(auto|statlindirect,,): 
“xcpq(0,0,0,9"F) 
STQ .TEMP 
LDA -TEMP 
TSX5 CTOA 
EAXS LAL 
TSX4 .QTOC" 
(ext,,): ” LDA oF 
L0Q. sw 
ADQ =01000,0U 


se 


* i 
8: yet 
Bw, 


FS RC tee Ree ae SON: Ae Tete e gees cee ee ee 
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Appendix V - The HIS-6000 C Routine Macro Definitions 


The C routine macro definitions used in the HIS-6000 implementation are listed on the following pages. A 
C routine macro definition is written as a C function returning a character string value. This character . 
string is “substituted” for the macro call and rescanned by the macro expander; thus, it may ‘contain 
references to its arguments and embedded macro calls, The format paranidters UF the C routine ere ARGC 
and ARGV: ARGC is an integer specifying the number:ef (cheracter-string) arguments present in the 
associated macro call; ARGV is an array of pointers to those arguments. 


When the following routines were written, the formatted print routine PRINT was capable of producing 
output only onto a file and not into a string in core; thus, where formatting is necessary, these routines 
print their output directly and return the null string. Although there sre dengers inherent :in this ‘practice, 
in these cases the effect is the same as if the formatted string were returned and printed normally. The 
character sequences "\t’, "\n’, and °\\' represent tab, newline, and beckslash, respectively. 


‘char sfn{] 
{"in*,"c oR tt, “nf”, “d", "nd" al" 
“edz oo " “er i] end","n pally. ee 
“other""if"} 
char (sff[]X) 
(sintacher atloshanestadoublesnegdasiansic, 
other, aif}; 
int nfn 18, 
lineno 0, 
mflag 0, 
packb(4], 
packno; 


char saln(argc,argv) int argc; cher sargvi} 


{lineno=atoi(argv[0])); 

packf(); 

return(".NeO -EQU 3") 
} 


char *aequ(argc,argv) int erge; char sergv[} 
{packf(); 
return("#0EQU *"); 
} 
char saint(argc,ergv) int argc; char sergv[} 
{packf(); 
return{"\tDEC\ts0") 
} 
char sachar(argc,argv) int argc; char eergv[} 
{if (arge>O) packc(atoi(argv[O})); 
return("\\"); /* conceal following newline «/ 
} , 
char safloat(argc,argv) int argc; char sargv[} 
{packf(} 
if ire print("\tDEC\t%m"atoiargv[O)))s 
return(™ 
char sadouble(argc,ergv) int argc: char sargv[} 
packf(); 
if (arge>0) 
{print("\tDEC\t" 
reece 
return(""); 
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} + 
char *anegf(argc,argv) int argc; char sargv[} 


{packf(); 
if (argc>O) print("\tDEC\t-Xm",atoi(argv[0]))); 


return("™); 


char sanegd(arge,argv) int arge: char sargv[} 
{ % 

packf(); 

if (arge>O) 


{print(*\tDEC\t-"); 
returnadbic(atoi(argv[0])); 


return(™); 


char sastring(argc,argv) int argc; char sargv[} 


{auto int i,f,Ic,c; 
auto char scp; 


Ic=O; /* location counter in STRING file #/— 
f=xopen(pname,fn_string, MREAD,BINARY), 


while(1) 

{packf(); 

c=cgatec(f); 

if(ceof(f)) break; 

print(".S%d\tEQu\ts\n"Jc); 

le++; 

- while(1) 
{if (c==°$”) 

{c=cgetc(f); 
Ic++; 
if (c=="0") c="\0’; 
packc(c); 


else 
{packc(c); 
if (1c) bresk; 
} 


c=cgatc(f); 
lo++; 


} 


cclose(f); 
return("\\"); 
} 


char *aend(argc,argv) int argc; char sargv[} 
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{packf(); 
return(“\tEND”); 
} 


char sregnames{] {“X0","X1",°X2",°X3","X4","A","0",""}; 
char saname(argc,argv) int argc; char eargv[} 
{auto int base,offset; 


if (argc>1) offset=atoi(argv[1]); else offset=0; 

if (argc>O) base=atoi(argv[(0]) else base~0; 

if (mflag) cprint(“ANAME(%d,%d)\n", base ottset); 

if (base>=0) swturrdregnemesiverals 

base = -base; 

if (base >= c_indirect) 
{print("%d,4d" offset /4,bese-c_indirect) 
goto check; 


else switch(base) { 


case c_auto: 
print("Xd,7”, offset/4); 
goto check; 

case c_extdef: 
retura("Ki(#1)"), 

case c_static: 

print(".STAT+%d",offset /4), 
goto check; — 

case C_param: 
print("%d,6",offset/4); 
goto check; 

case c_Jabel: 
print(".LX%d",offset); 
break; 

case c_integer: 
if. (offset<O || offset>32000) print(“ata" otteets 
else print("%d,DL",offset); 
break; 

case c_float: 
print("=%s" adbic(offset)); 
break; 

case c_string: 
print(".SX%d" offset) 
break; 


return(*"); 

check: 

if (offset%4) error(6025,Jineno) — 
return(™); 

} 


/seccesessecseseeseseeeeeeseseesseeseseeseeseesseeeeesseeseseseeeessees 
AALIGN - align location counter 


s/ . 
char saalign(argc,argv) int argc; char sargv[} 
{ | 
switch(atoi(argv[0))) { 
case ct_double: 
packf(); 
return(“\tEVEN”); 
} 
return("\\"); 
} 
/stesaeassessesessessssseesesssesnseesnecaseneseeseesseenssnsossasetens 
AS - emit conditional jump 
s/ . 
char #ajc(argc,argy) int argc; char sargv[} 
{auto int cond; 
cond=atoi(argv[0)); 


switch(cond) { 


case cc_eq0: return("\tTZE\tel") 

case cc_ne0: return("\{TNZ\ta1"); 

case cc_jt0: return(“\tTMI\ts1”) 

case cc_ged: return(“\tTPL\tel") 

case cc_gt0: return("\tTZE\ts+2\n\tTPL\tel "> 
case cc_je0: return(“*\tTZE\tel \n\t{TMI\tel”} 
return(""); 


char sother(argc,argv) int argc; char sargv[} 
{switch(atoi(argv[O])) { 

case 5: return"Q"); 

case 6: return{"A"); 


} 
ae 


char saif(argc,argv) int argc; char sargv[} 


{return(atoi(argv[0])?"#1":"82"); 
} 


/* PACK CHARACTERS INTO WORDS */ 


, packc(i) int i; 


PR eo Fo ha a REE pitta ute ce ERE SENS ea lies 


{ 
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{ 
packb[packno++]=i; 
if (packno>=4) 
{print("\tVFD\t9 /%d,9;/"%e, 9 /id,9 /ual\p 
peckbfO peck Jose }psch( 3D 
eam 

} 
packf() 
Nae aeacecaaa packc(0); 


char saadcon(argc,argy) int arge; cher sargvi} 


{packf(); 
return(“\tZERO\t#0"), 
} 


char *azero(argc,argv) int arge; char sergv[} 
{auto int i,js | 


if (arge>O) 
{i=atoi(argv[O}) 
while(packno && i) {packc(Odi--;} 
. J = if4; i ah 4; 
: Ar Print(\BSS\tad\ni)s 
leli-~)packe{O}; 


tur 


char saidn(argc,argv) int argc; char sergv(} 


{auto char scp1,scp2; 
static char n[7} 
auto int i,c; 


_ if. (arge>0) Se am 
{cpl = &cstore[stoiargv[(O})} 
cp2 =n; - 
for(i=0;i<6;i++) 

{c = scplow | 

‘if (ec m=) oS; 
sep2++ ac 


ey 
*cp2="\0"; 
return); 


} 
return(™); 


} 
adbic(i) 


{auto char *cp1,ecp2; 
static char buf[30} 
auto int c,flag; 


flag=FALSE; 
cpl = &cstore[i}; 
cp2 = &buf[0}; 


while(c = #cp1++) 
{if (c == E’) 
{flag=TRUE; 
c=’D’; 


} 
if (cp2 < &buf[27]) 
#cp2++ = Cj 


if (tflag) 
{scp2++ = "D’; 
scp2t+ = 0’; 


*cp2++ = "\0’; 
return(&buf[0]); 
} 
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Appendix VI - Overall Description of the Compiler 


The compiler consists of four major phases. First, the lexical analysis phase (Ci) transforms the source 
program into a string of lexical tokens such as identifiers, constants, and operators. Second, the syntectic 
analysis phase (C2) parses the token string and produces a tree representation of each function 
(procedure) defined in the source program. Third, the code generation phase (C3) transforms the trees 
produced by the syntactic analysis phase into an intermediate language program consisting of a sequence 
of macro calls representing instructions of the particular abstract:machine defined by the implementer. 
Finally, the macro expansion phase (C4) expands the macro calls, producing en object language program 
as the output of the compiler. In addition, there is an error message editor (C5) which Is invoked last in 
order to format any error messages produced by the other phases. The phases of the compiler are 
invoked in sequence by the contro! program (CC). The control program communicates with the. various 
phases by passing as arguments to an invoked phase « set of charecter strings representing file names 
and an option list; the invoked phase returns a completion code which indicates whether or not any 
serious or fatal errors occurred during the execution of that phase. The verious phases communicate 
with each other using intermediate files. 


The lexical and syntax analysis phases may be run sequentially as described above, or, where a system’s 
program size restrictions permit, may be combined into a single phase, thus eliminating the use of an 
intermediate file. This option is implemented through the use of compile-time conditionels. The remainder 
of this chapter will assume that the two phases are separate. 


1. The Lexical Analysis Phase 


The lexical analyzer reads in the source program and breaks it into a string of tokens such as identifiers, 
constants, and operators. The lexical analyzer also interprets compile-time control lines which allow one 
to include source. from other files and to define manifest constants. The lexical analyzer produces output 
onto three intermediate files: the TOKEN file, which contains the string of tokens, the CSTORE file, which 
contains the source representations of identifiers and floating-point constents,.and the STRING file, which 
contains character string constants. The TOKEN file is passed to the syntax analysis phase; the CSTORE 
and STRING files are not used until macro expansion. In addition, the lexical analyzer may write error 
messages in an internal form onto the ERROR file. A token is represented by a pair of integers called the 
TYPE and the INDEX of the token. The syntax analyzer performs its analysis on the basis of the token 
TYPE; thus most operators have a distinct TYPE, and there are separate TYPEs for identifiers, integer 
constants, floating-point constants, and character string constants. The INDEX is used to distinguish 
particular identifiers or constants; for example, the INDEX of an identifier is the index of the source 
representation of the identifier in the array of characters written onto the CSTORE file. 


The main routine of the lexical analyzer consists of a loop which calls a routine GETTOK to return the 
next token in the input stream and then writes the token onto the TOKEN file. This loop also contains 
code to interpret compile-time control lines. GETTOK obtains input characters from a routine LEXGET 
which contains the logic to switch the input between the primary source file and “included” files. Except 
when processing character string constants, GETTOK translates the input characters using a translation 
table. On GCOS, this translation maps lower case into upper case, tabs into blanks, and carriage returns 
into newlines. This table would be changed when moving the compiler to a system using other than the 
ASCII character set. GETTGK partitions the character set into the following character classes: 
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1. letters 

2. digits 

3. apostrophe (’) 

4. quotation mark (") 

5. newline 

6. blank 

7. period (.) 

8. the escape character (\) 

9. invalid characters 

10. characters which are unambiguously single- 
character operators (such as °{’) 

11. characters which may begin a multi-character 


operator (such as *<’ which may begin <=") 


GETTOK uses the character class of the current input character to determine its actions in analyzing the 
input string. 


2. The Syntax Analysis Phase 


The syntax analyzer accepts as input the token string generated by the lexical analyzer and produces 

output onto three intermediate files for the code generation phase: a tree representation of each function 

defined in the source program is written onto the NODE file; a symbol table containing declarative 

information about identifiers is written onto the SYMTAB file; and information regarding specified initial | 
values of variables is written onto the INIT file. 


The main routine of the syntax analysis phase is a table-driven LALR(1)‘parser. The tables are generated 
by a parser-generator YACC, written by S. C. Johnson [18] The input to YACC is a BNF-like description 
of the syntax of C, augmented by action routines which are to be invoked by the parser when particular 
reductions are made. YACC analyzes the grammar and produces a set of tables written in C which are 
then compiled into the syntax analysis phase. 


The tables produced by YACC represent instructions to the parser to test the TYPE of the current input 
token, to shift the current input token onto the stack, to perform a reduction and call an action routine, or 
to report a syntax error. When a syntax error is discovered, the parser writes error messages onto the 
ERROR file which give the current state of the parse. It then attempts to recover from the error so that 
any additional syntax errors in the program can meaningfully be reported. The parser attempts a 
recovery by popping states from the stack and/or skipping input tokens in various combinations. A 
recovery attempt is considered successful if the next five input tokens are shifted without detecting a 
new syntax error. If a recovery attempt is successful, error messages are written which describe the 
recovéry actions taken and parsing is continued. If a successful recovery cannot be made within a limited 
region of the input program, the parser ceases execution after writing an error message. 


The following C program illustrates the compiler’s response to a syntax error, in this case unmatched 
parentheses: 


int c; 

int f(file) 

{if ((c=getc(file) != 0) return(-1); 
return(0); 

} 


The first error message, listed below, gives the state of the parse when the syntax error was discovered, 
followed by a cursor symbol °_’, followed by the next five input tokens. The next error message indicates 
that the parser was able to recover from the error by skipping the next two input tokens. The resulting 
program, although syntactically correct, is meaningless. Therefore, in order to avoid extraneous error 
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messages, the code generation phase and the macro expansion phase are not executed es syntax 
errors have been detected. 


3: SYNTAX ERROR. PARSE SO FAR: <ext_def_list> <tusiction_deP 
<block_head> IF (<a> _ RETURN (.~ 1) ; 
3: SKIPPED: RETURN ( 


The following program also contains a syntax error due to unmatched parentheses; however, since there 
are no more right parentheses in the statement following the point where.the error is hetes the 
parser recovers from the error by deleting the unfinished IF clause. . 


int c; 

int f(file) 

{if ((c=getc(file) == 0) cm-l; 
return(c); 


} 


3: SYNTAX ERROR. PARSE SO FAR: <ext_def_list> <function_dcl> 
<block_head> IF (<e>_C=-1; 
9: DELETED: IF ( <a>. 


The following program is an example of a syntex-error. from which the perser could not recover within its 
allowed limits; thus, after skipping input tokens up to this limit, the perser gives up. 


int ¢; . . 

int f(file) | 
{if ((c=getc(file) != 0) ¢ « M 
elsec=0; | ~ 

return(c); - 


} 


3: SYNTAX ERROR. PARSE:SO FAR; <ext_def_tist> <tunction e> 
<biock_head> IF ( <a> __C = 1; ELSE 
3: SKIPPED: Cel; 

’ 4: I GIVE UP. 


3. The Code Generation Phase 


The code generation phase performs the following operations: (1). allocates storage for (determines the 
run-time locations of) variables, (2) performs type checks on operands and inserts conversion operators 
where necessary, (3) translates the tree representation of expressions into a more descriptive form with 
AMOPs, (4) performs some machine-independent optimizations on expressions, (5) emits macro calls. to 
define names which may be referenced by other programs (ENTRY symbols) and to declare names which 
are assumed to be defined in other programs (EXTRN symbols), (6) emits macro calis to define and 
initialize variables, (7) emits macro calls to execute the control statements of each function defined in the 
source program, and (8) emits macro cails to evaluate expressions. 


The code generation phase reads the NODE, SYMTAB, and INIT files produced by the syntax analysis 
phase and writes an intermediate language program in the form of macro calls onto two intermediate files, 
the MAC file and the HMAC file. The HMAC file contains the macro calls defining ENTRY symbols and 
EXTRN symbols which are produced last by the code generation phase. but which, in some. systems, may 
be required to appear at the beginning of the assembly language program. The MAC file contains the 
remainder of the intermediate language program, 


The main routine of the code generation phase consists of a call to a routine SALLOC, which allocates run- 
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time storage and emits macro calls to define and initialize variables, followed by a loop which reads in the 
tree representation of a single C function from the NODE file and generates code (macro calls) for that 
function, followed by a call to a routine SDEF which emits macro calls to define ENTRY and EXTRN 
symbols. , 


The generation of code for a C function begins with a call to a routine FHEAD with the name of the 
function as an argument. FHEAD emits a PROLOG macro call which defines the entry point and produces 
code to set up the proper run-time environment. FHEAD then allocates storage in the run-time stack 
frame for the automatic variables of the function; storage is allocated for automatic variables in order of 
decreasing alignment requirement so that no space is wasted in the stack frame. The stack frame is 
assumed to be aligned according to the strictest of the alignment requirements of the various C data 
types (usually that of double-precision floating-point). A save area of the size specified in the machine 
description is reserved at the beginning of the stack frame. 


The call to FHEAD is followed by a call to the routine STMT to generate code for the compound statement 
which is the body of the C function. The generation of code for the body of a C function occurs on two | 
levels, the statement level and the expression level. The generation of code for statements is handled by 
the routine STMT which takes one argument, a pointer to a subtree representing a C statement. STMT is 
actually a very short routine which makes recursive calls to itself for the branches of a STATEMENT_LIST 
node and calls a larger routine ASTMT if the specified node is an actual statement (as opposed to a 
statement list). The purpose of splitting code generation for statements into the two routines STMT and 
ASTMT is to minimize the amount of stack space used while recursively descending the statement tree. 


Following the call to STMT to generate code for the body of the C function, the size of the stack frame is 
adjusted to be a multiple of the stack alignment and an EPILOG macro call is emitted. On the HIS-6000, 
the EPILOG macro defines an assembly-language symbol whose value is the stack frame size; this symbol 
is referred to by the code produced by the PROLOG macro which allocates the stack frame. 


4. The Macro Expansion Phase 


The macro expansion phase expands thé macro calls on the HMAC and MAC intermediate files using the 
information on the CSTORE and STRING intermediate files and places the result of that expansion on the 
output file. The macro expander is not a general-purpose macro processor; in particular, there are no 
built-in macro cails for defining macros or for handling local or global variables. Furthermore, the total 
number of characters (after any macro expansion) in the argument list of a macro call is limited to 100. 
The maximum allowed depth of nested macro calls is 10. 


The macro expander processes a stream of characters terminated by a NULL character. Within this 
stream of characters, the characters °%’, ’#’, and *\’ have special significance. The "%’ character indicates 
the beginning of a macro call, which consists of the *%’, followed by the name of the macro, followed by a 
(possibly null) list of character string arguments separated by commas and enclosed in parentheses. The 
*#" character is used within the body of a macro definition to refer to the arguments of the macro call; the 
character sequences ’#0” through ’#9’ refer to arguments O through 9, respectively. The "\’ character is 
an escape character. The special interpretation of a character such as °%’, ’#’,’)’ or ’, is inhibited when 
that character is preceded by a *\’. In addition, the character sequences *\t’, "\n’, "\r’ are used to 
represent tab, newline, and carriage-return, respectively. A '\’ character followed by a newline character 
results in both characters being ignored; thus a macro which expands to a backslash will swallow the 
newline which followed the macro cail in the input file. (A macro call in the input file which expands to 
the null string will leave a blank line in the compiler output; this is generally a sign that the implementer 
has not completely specified the macro definition for an AMOP.) The backslash character itself is 
represented by °\\’. 


The normal operation of the macro expander consists of copying characters directly from the input stream 
to the output stream. When a "%’ is encountered, the name of the macro and the arguments of the macro 
call are evaluated and collected in a buffer; this evaluation may itself involve the processing of embedded 
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macro calls. The input stream is then switched to the body of the macro detinition and normal processing 
is resumed. When a ‘#’ is encounterad, the argument number is read and the input stream is switched to | 
the corresponding character string argument of the current macro call, which is stored in the associated 
buffer. Normal processing is then resumed. The input stream operates in a steck-like manner in that 
when the end of a macro definition or an argument string is reached, the input stream is restored to its 
previous state. When end of file is reached on the HMAC file, the input stéeam. is switched to the MAC 
file; when end of file is reached on the MAC file, macro expansion is terminated. 


There are three types of macros which are handled by the macro expander. First, there are the macros 
representing three-address abstract machine instructions, which are. produced. by the code generator 
while processing expressions. These macros ere defined only in the machine description; the macro calls 
are of a special form which directly specifies the internal number of the corresponding macro definition, 
as assigned by GT. For example, the macro call %3 refers to macro definition number 3. Second, there 
are the keyword macros which are produced by the code generator while processing function. definitions 
and statements. These macros may be defined either in the machine description or by C routines; the 
macro calls specify the macro names as given in Appendix. Ill. Finally, there. are the macros which are 
created by the implementer and. used within other macro. definitions. ‘These macros may be datined 

in ed machine description 6r by € routings; the eiaera cate apicity. fhe inact, name 06 3 
implementer 


A macro which is defined in the machine description is specified as a list of one or more character string 
constants, possibly with associated location prefixes for conditional expansion. Such a macro definition is 
implemented as a list of pointers to the cheracter., string with associated integers 
representing the conditions specified in the location prefixes, if any. are, accessed. through . an 
array MACDEF, produced by GT, which is. indexed by the internal x, The ore cena aasigned.by 
GT to each macro definition in the machine description, As. mentioned abaye,.a macro call representing a 
_ three-address abstract machine instruction directly specifies the macro definition number, Other macros 
defined in the machine description are represented in a table produced. by. GT. which mesocinins the, macro 
names with the corresponding macro definition numbers. 


Macros defined by C routines are. represented in a table provided by the. implementer which associates 
the macro names with the corresponding C functions. This table consists of an array FN of pointers to. 
the character string macro names, an array FF of pointers to the. corresponding C functions, and an 
integer NFN specifying the oumber. of entries in the table. It would be more convenient for the 
implementer to specify the C macro definitions in the machine description and let GT construct. NEN, FN, 
and FF; however, this was not done because of the lexical difficulties associated with aati C source in 
the machine description. 


The macro expander is implemented as two levels of get-character routines. The lower level routine, 
GETC1, returns the next character from the current input source which may be either the input file 
(HMAC or MAC intermediate file) or a character string in memory. If it is a character string, it may be 
part of a definition of a macro specified in the. machine description, an argument of the current. macro.call, 
or the result returned by a.C.routine macro definifion. The current state of ‘the i stream is kept in a 
stack of structures called input contro! blocks (ICBs). GETC1 uses the top ICB.on agi to. determine 
the source of the next character. The members of an ICB structure are iin blow with thelr ening: 


ats 


F a flag indicating the type of the current input source (the input file, a macro 
defined in the machine description, or a character string) 


LOCP if the current input source is a macro defined in the machine description, this is a 
pointer to the current position in the list containing the pointers to the character 
strings which make up the macro definition 


cP if the current input source is not the input file, this is a pointer to the next 
character in the current character string 


ARGV[10] an array of pointers to the character string arguments of the current macro call 


BASE[3]}] the REF.BASEs of the result, the first operand, and the second operand of the 
current macro call, used when computing conditional expansion 


A NULL character indicates the end of a character string or end-of-file on an input file; thus if the current 
input character is NULL, GETC1 updates the current state of the input stream by advancing LOCP or by 
popping an ICB off the stack or by switching the input file from the HMAC to the MAC intermediate file. 
GETC1 returns the NULL character only upon end-of-file on the MAC intermediate file. 


The higher level get-character routine is MGET, which implements the *s#’, "%’, and *\’ conventions. MGET 
begins by calling GETC1 to obtain a character. If the character returned is a backslash, then GETC1 is 
called again to obtain the second character of the escape sequence and the appropriate action is taken: 
If the escape sequence is ’\t’, ’\n’, or *\r’, then the character is taken to be tab, newline, or carriage 
_ return, respectively. If the second character is a newline, then it is ignored, and MGET returns the result 

of a recursive call to itself. Otherwise, the second character is returned as the value of MGET (thus it is 
protected from special interpretation). 


If the resulting character is not a °#’ or a °%’, then MGET returns that character directly. A °s’ followed 
by a digit results in pushing a new ICB onto the stack pointing to the appropriate character string 
argument of the current macro call. A ’#’ followed by °0’, ’F’, ’S’, or ’R’ (see Appendix I, section 3) results 
in a call to the C routine ANAME (which implements the NAME macro) with the appropriate arguments. 
When a °%’ is encountered, the macro name is collected and the arguments are assembled into a 100- 
character buffer. The macro name and the arguments are obtained by recursive calls to MGET so that 
embedded macro calls are expanded; the result of expanding an embedded macro call may include commas 
or right parentheses without interfering with the argument structure of the macro call being processed. 
If the macro name is an integer, the correspondingly numbered macro definition from the machine 
description is used; otherwise, the macro name is looked up in a hash table containing the names of all 
defined macro names. If the macro is defined in the machine description, a new ICB is pushed onto the 
stack with LOCP pointing to the beginning of the list of pointers to character strings which represents the | 
macro definition. Otherwise, if the macro is defined by aC routine, the C function is called and an ICB is 
pushed onto the stack which points to the character string returned by that function; thus references to 
arguments and embedded macro calls in the string returned by the C function are processed. MGET then 
resumes normal operation by calling GETC1. Note that the effect of a call to an undefined macro is to 
replace the macro call by the null string; no error messages are produced by the macro expander. 


The main routine of the macro expander consists of initialization, including the setting up of the hash 
table, followed by a loop which calls MGET repeatedly and writes the returned character onto the output 
file; this loop terminates when the returned character is NULL. 


5. The Error Message Editor 


The error message editor is invoked as the last phase of the compiler to read from the ERROR 
intermediate file the error records written by the previous phases and to print error messages 
corresponding to those error records. The error message editor allows variable data, such as identifier 
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names, to be included in the printed messages. In addition, error messages of arbitrary length can be 
constructed from a sequence of error records; the error message editor automatically breaks long output 
lines so that ail output lines fit within a fixed page width. 


An error record is a structure containing seven integers: an error number, a line number, and five 
arguments. The error number selects a basic error message string which contains the fixed text of the 
error message and optional indicators for including variable data. An. indicator is a two-character 
sequence beginning with a “%; the character following the ‘¥ defines the intecpratation of the variable 
data which will replace the indicator when the string is printed. The variable dats is specified by one or 
more of the arguments in the error record. The arguments are assecisted. mith the indicators. from:left to 
right; arguments are used as needed according to the interpretations specified by the indicators. The 
various indicators are listed below with their interpretations: 


%d sprint the next sraument as a decimal integer 


%m_ print the string in the internal compiler table CSTORE which begins at the index 
specified by the next argument 


%n_ print a string representing a node (operator) of the internal representation predxes by 
the syntax enue? phesa, as spacified by the next argument — 


%q print a string representing the terminal or nonterminal symbol associated with the 
parser state specified by the next argument . 


%t print the source representation of the token whose TYPE and — are specified by 
the next two arguments 


%% ~~ oprint a "% 


Only the arguments which are referenced by the basic error message string are specitied when an error 
record is written; the values of the remaining arguments in the record are undelined. 


The line number field in the. error record associates a line in the. source. program with the error which. 
produced a particular error record. If a line number is given (LINENO > 0), 4 is-printed out on e new:line, 

followed by a colon, followed by the text specified by the error record; otherwise (LINENQ <r 0), the text 

specified by the error record is printed on the current line. Thus. an error. wenssage consists of en initial 

error record containing a line number followed by zero or more error. records without line numbers. In 

this manner, an error message of arbitrary length can be constructed. For example, the.message giving. . 
the current state of the parse when a syntax error has bean. discovered (sea. section 2) is.constructed 
from the following basic error message strings: 


“SYNTAX ERROR. PARSE SO.FAR: -* 
* %q” (for each state on the parser stack) 
Pease (represents the input cursor) 
"xt" (for each of the next’5 input tokens) 


The syntax analysis phase. can produce these error messages without. counting the symbols in the | 
message or knowing their lengths because the error message editor takes care. of breaking long output 
lines. 


In addition to selecting a basic error message string, an error number represents the severity level of 
the corresponding error: 


Be ae” comet 


error number — severity 


1000 - 1999 error 
. 2000-3989. perious. error 
4000-5999 _fatsl error . 
6000 - 6999 compiler error . 


A fatal error or a compiler error will terminate the current phese, and ‘no remaining phase (except the 


error message editor) will be invoked; in addition, s compiler. error. Se aN fomatieat. arsanted, by 
the string ee 


“COMPILER enn” 


A serious error allows the current phase to continue # erection but, an Feng ‘hones (axcept the. error 
message editor) are skipped. 


The error message editor writes its output onto the standard output | a 
terminal in a time-sharing system or a line printer in a batch system — , wher the ‘eonipiler fs 
' submitted as a batch job by. a time-sharing user, this output is redirected onto an error listing file. This 
is accomplished by passing the argument ">>$el” to the error message editor which indicates that output 
to the standard output unit is to be appended onto fitecode EL (the error listing file). Redirection of 
standard input and output is a (not necessarily portable) feature of the C run-time system, rather than of 
the compiler itself. 


bine ae 


6. Invoking the Compiler Phases 


The mechanisms for invoking a phase of the compiler, passing arguments to it, end returning @ completion 
code are operating system dependent. In general, the contro! program will be rewritten for each system 
on which the compiler runs; on some systems, the contro! program may be replaced by a set of job 
control cards (see Figure 1 on page 31). The source of the compiler phases need not be changed,. 
however; the operating system dependencies associated with the invocation of a C program are isolated 
in two run-time routines, the startup routine and the exit routine. The startup routine receives control 
from the operating system, establishes the C run-time environment, and calls the C routine named MAIN. 
It is the responsibility of the stertup routine to take the character string arguments, which may be 
provided by the operating system or written on a temporary file, and arrange them as an array of 
character strings which is then passed as an argument to MAIN. The exit routine EXIT is called upon a 
return from MAIN; it may also be called directly by a C program. The exit routine closes all open files 
and returns control to the operating system. EXIT has one optional argument, a return code, which it 
communicates to the control program as a completion or abort code or by writing it onto a temporary file. 


On UNIX, a phase of the compiler is invoked by calling the system routine FORK, which creates a new 
process, followed by a call in the new process to the system routine EXECL, which overwrites the process 
with the desired phase of the compiler and passes it a list of character strings as arguments. The old 
process waits for the execution of the compiler phase to finish by calling the system routine WAIT, which 
waits for the process to die and returns its completion code. , 


On GCOS, two methods are used to invoke a phase of the compiler from the control program, which runs 
in time-sharing. The first method uses a routine SYSTEM, a C-callable interface to the system call CALLSS 
which can invoke any time-sharing subsystem (program). The cheracter string arguments are passed in 
the system teletype buffer (using the system cail PSEUDO) so that to the invoked program it appears that 
it was invoked by a command typed at command level with those. arguments. The completion code is 
stored (using the system call CORFIL) in the first word of the core file, a ten word buffer provided by the 
operating system for communication between a user’s subsystems. The disadvantage of running the 
compiler phases in time-sharing is that the compiler phases, being large programs, can take a very large 
elapsed time to run. Thus this method is used ony for the error message editor which prints error 
messages On the user’s terminal. 
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The second method uses a routine TASK, a C-callable interface to the TASK system call, to submit a 
program as a special, naeaeity batch activity. The elapsed time for a TASK activity is. typically much 
lower than for the same program run in time-sharing. The character string argume! 

temporary file which is reed by the startup routine when in batch. The ( 
follows: if there is no argument to EXIT or the argument is 0, EXIT | mally. 

return a status code of 0. Otherwise, EXIT aborts with the completion code he abort code; the abort 
code is then returned in hie status — by TASK, 


The compiler phases can also be sida as normal GCOS batch activities by the sequence of control 

cards shown in Figure 1.. When these cards are submitted, IDENT and USERID cards are inserted at the 

beginning of the deck and the characters ’s’ and ‘X’ are replaced by the user’s identification and. the basic 
component of the source file neme, respectively. Thus if the user is 'B’ and the source file is *B/TEST.C’, 

the assembly-language output will be written onto the file "@/TEST.G’ and the error messages will be 
written onto the file "B/TEST.E”. The generation of the control carts ahd the submission of thé batch job 
is performed by a time-sharing program (command). As the turn-around time for a norms! batch job can 
be quite Mr en et ha cena a ind Sve Gr enee prnareny Wren oe, tap Nene, f8 capeno 
meee ee eres he Seer 


